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detection, particularly as applied 
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DIFFERENTIAL TAGGING OF POLYMERS 
FOR HIGH RESOLUTION LINEAR ANALYSIS 

Related Applications 

5 This application claims priority to U.S. Provisional Patent Application filed 

September 18, 2001, entitled DIFFERENTIAL TAGGING OF POLYMERS FOR HIGH 
RESOLUTION LINEAR ANALYSIS", Serial No. 60/322,981, the contents of which are 
incorporated by reference herein in their entirety. 

10 Field of the Invention 

The invention relates to linear analysis of sequence information for polymers such as 
biological polymers, and provides improved spatial resolution of signal detection systems. 

Background of the Invention 

15 Sequence analysis of polymers has many practical applications. Of great interest 

recently is the ability to sequence the genomes of various organisms, including the human 
genome. Specific sequences can be recognized with a host of sequence-specific tagging 
methods such as various types of probes, engineered proteins, and also synthetic compounds. 
In any of these sequence-specific tagging approaches, there is always a need to resolve 

20 adjacent tags, in order to achieve higher resolution and thus map as much of the polymer as 
possible. 

Linear analysis of DNA can be accomplished by analysis of fixed DNA molecules, 
analysis of moving DNA molecules, and analysis of DNA molecules using readers such as 
molecular motors or proteins capable of scanning along the length of a DNA strand. These 

25 approaches make use of a number of signals and detection systems to acquire the information 
from the sequence-specific tags on the polymer. For instance, fluorescence, atomic force 
microscopy (AFM), scanning tunneling microscopy (STM), as well as other electrical and 
electromagnetic methods, are suitable for capturing signals and thereby "reading" the 
sequence information of a polymer. All of these methods can be characterized and limited by 

30 their spatial resolution. Spatial resolution defines the minimum distance two adjacent probe 
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molecules (e.g., sequence-specific tags) can be separated from each other and still be 
simultaneously detected as distinct, separate signals. 

Fluorescence detection is often carried out by imaging. Optical resolution of 
fluorescence detection systems defines the smallest distance between probes at which they 

5 can still be distinguished. This distance is determined by diffraction. In a confocal 

microscopy system, in which the sample is illuminated and viewed through a pinhole in the 
image plane, the pinhole size determines the lateral resolution under uniform illumination of 
the pinhole. A confocal microscope system can be used in combination with a flow system 
that moves a target molecule (e.g., DNA or RNA) through a detection spot in the focal plane 

JO of the microscope. If the target molecules are stretched out in the direction of motion, and 
moved singly through the detection spot, then bound fluorescently labeled probes can be 
sequentially detected as they enter the detection spot. If the velocity of the target polymer is 
known, then the distance between detected probes can be determined from the time between 
sequential signals. According to prior art systems, probes that are spatially separated by more 

15 than the spot size can be distinguished from each other. 

There is a need for increasing the resolution of detection systems in order to increase 
the amount of data captured from polymer analysis approaches. 

Summary of the Invention 

The invention is based, in part, on the discovery that differential tagging of sequence 
specific probes allows the positions of such probes to be determined with greater spatial 
resolution than could be achieved previously. The invention increases the efficiency of 
polymer sequence analysis by increasing the amount of data that can be captured per a single 
analysis. Current methods of polymer analysis are limited by the spatial resolution of the 
detection system used. The invention increases the spatial resolution of several detection 
systems, thereby allowing for a greater amount of sequence information to be obtained during 
individual runs. The invention provides both methods and systems for analyzing polymers 
based on these discoveries. 

In one aspect, the invention provides a method for analyzing a polymer comprising (a) 
providing a detection station having a known detection resolution; (b) labeling the polymer 
with first and second unit specific markers, the first unit specific marker including a first label 
and the second unit specific marker including a second label distinct from the first label, 
wherein the first and second unit specific markers are spaced apart on the polymer such that, 


20 


25 


30 
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if the labels were not distinct from each other, they would be separated by a distance less than 
the known detection resolution; (c) exposing the polymer labeled as in (b) to the detection 
station to produce distinct first and second signals arising from the first and second labels; and 
(d) identifying the distinct first and second signals. 
5 In one embodiment, the first unit specific marker is different from the second unit 

specific marker, either in its nature or in the polymer unit it recognizes and binds to. In 
another embodiment, the first unit specific marker is identical to the second unit specific 
marker, yet the first and second unit specific markers are labeled with distinct labels. Unit 
specific markers may be referred to as being "identical to each other" i£ although of different 
10 nature, they recognize and bind to the same polymer unit or sequence. The nature of a unit 
specific polymer refers to its composition (e.g., nucleic acid, peptide, carbohydrate, etc.) 
rather than its sequence specificity. 

In one embodiment, the first unit specific marker and the second unit specific marker 
are positioned at consecutive units along the length of the polymer (i.e., immediately adjacent 
15 to one another). In another embodiment, the first unit specific marker and the second unit 
specific marker are spatially separated from one another by at least one unit, or at least two 
units. 

In a further embodiment, the polymer is labeled with a third unit specific marker. 
Preferably, the third unit specific marker comprises a third label. The third unit specific 
20 marker may be positioned relative to the first and second unit specific markers such that the 
signal produced by the third unit specific marker is above system detection resolution with 
respect to the signals of the first and second unit specific markers. In other words, the third 
unit specific marker is spaced apart from the first and second unit specific markers by a 
distance greater than the known detection resolution. 

25 In some embodiments, the third unit specific marker is used as a standard from which 

to compare multiple data sets. 

In this as well as other aspects of the invention, the polymer may be a biological 
molecule, but is not so limited. In important embodiments, the polymer is a peptide or a 
nucleic acid molecule. In some preferred embodiments, the polymer is a nucleic acid 
SO molecule that is genomic DNA. Accordingly, in some embodiments, the unit specific 

markers, including the first, second and subsequent unit specific markers, are nucleic acid 
molecules. In other embodiments, the first, second and subsequent unit specific markers are 
peptide nucleic acid (PNA) molecules or locked nucleic acid (LNA) molecules. In still other 
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embodiments, the unit specific markers are peptides or polypeptides. In still another 
embodiment, the polymer is composed of a backbone, which optionally includes a label (e.g., 
the backbone can include an inherent label or an extrinsic label). 

In one embodiment, the unit specific markers have identical binding specificity. As an 
example, the unit specific markers may be nucleic acid molecules having an identical 
sequence. The length of the marker will depend upon the particular embodiment Thus, in 
one embodiment, a marker that is a nucleic acid molecule is less than 12 bases in length, 
while in another embodiment, the marker that is a nucleic acid molecule is at least 4 bases in 
length. 

In certain embodiments, the first and second unit specific markers (as well as any 
subsequent unit specific markers) are conjugated to a label, preferably a detectable label. In 
some embodiments, the label is selected from the group consisting of an electron spin 
resonance molecule, a fluorescent molecule, a chemiluminescent molecule, a radioisotope, an 
enzyme substrate, an enzyme, a biotin molecule, an avidin molecule, an electrical charge 
transferring molecule, a semiconductor nanocrystal, a semiconductor nanoparticle, a colloid 
gold nanocrystal, a ligand, a microbead, a magnetic bead, a paramagnetic molecule, a 
quantum dot, a chromogenic substrate, an affinity molecule, a protein, a peptide, a nucleic 
acid, a carbohydrate, a hapten, an antigen, an antibody, an antibody fragment, and a lipid. 
In some embodiments, the first, second and subsequent labels are independently 

selected from the above group of labels. 

In related embodiments, the signals produced from the labels are detected using a 
detection system. The detection system may be non-electrical in nature (such as a 
photographic film detection system), or it may be electrical in nature (such as a charge 
coupled device (CCD) detection system), but is not so limited. In some embodiments, the 
detection system is selected from the group consisting of a charge coupled device detection 
system, an electron spin resonance (ESR) detection system, a fluorescent detection system, an 
electrical detection system, an electromagnetic detection system, a photographic film 
detection system, a chemiluminescent detection system, an enzyme detection system, an 
atomic force microscopy (AFM) detection system, a scanning tunneling microscopy (STM) 
detection system, an optical detection system, a nuclear magnetic resonance (NMR) detection 
system, a near field detection system, and a total internal reflection (TIR) detection system. 

In another aspect, the invention provides a system for optically analyzing a polymer of 
linked units comprising (a) an optical source for emitting optical radiation of a known 
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4 

wavelength; (b) an interaction station for receiving the optical radiation in an optical path and 
for sequentially receiving units of the polymer that are exposed to the optical radiation to 
produce detectable signals; (c) dichroic reflectors in the optical path for creating at least two 
separate wavelength bands of the detectable signals; (d) optical detectors constructed to detect 
5 radiation including the signals resulting from interaction of the units with the optical 

radiation; and (e) a processor constructed and arranged to analyze the polymer based on the 
detected radiation including the signals. 

Preferably, the units of the polymer are bound to unit specific markers which in turn 
are labeled. In such embodiments, the signal derives from the label. 
10 In one embodiment, the units of the polymer are labeled either directly or indirectly 

(e.g., with a labeled unit specific marker) with at least two radiation sensitive labels. In 
another embodiment, the units of the polymer are labeled with at least two radiation 
insensitive labels. Examples of useful labels include labels that have a size dependent feature 
to them, labels that comprise a particular chemical group, etc. Those of ordinary skill in the 
15 art will be familiar with examples of both categories. 

In another aspect, the invention provides a system for optically analyzing a polymer. 
This system comprises an optical source for emitting optical radiation; an interaction station 
for receiving the optical radiation and for receiving a polymer that is exposed to the optical 
radiation to produce detectable signals; and a processor constructed and arranged to analyze 
20 the polymer based on the detected radiation including the signals. As described above, the 
polymer is bound to at least two unit specific markers that are preferably labeled. 

In one embodiment, the interaction station includes a localized radiation spot. In a 
further embodiment, the system further comprises a microchannel that is constructed to 
receive and advance the polymer units through the localized radiation spot, and which 
25 optionally may produce the localized radiation spot. In another embodiment, the system 
further comprises a polarizer, and the optical source includes a laser constructed to emit a 
beam of radiation. The polarizer may be arranged to polarize the beam. While laser beams 
are intrinsically polarized, certain diode lasers would benefit from the use of a polarizer. In 
some embodiments, the localized radiation spot is produced using a slit located in the 
30 interaction station. The slit may have a slit width in the range of 1 nm to 500 nm, or in the 
range of 10 nm to 100 nm. In another embodiment, the interaction station includes a 
microchannel and a slit having a submicron width arranged to produce the localized radiation 
spot. In some embodiments, the polarizer is arranged to polarize the beam prior to reaching 
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the slit. In other embodiments, the polarizer is arranged to polarize the beam in parallel to the 
width of the slit. The foregoing embodiments apply equally to other aspects of the invention. 

In yet another embodiment, the optical source is a light source integrated on a chip. 
Excitation light may also be delivered using an external fiber or an integrated light guide. In 

5 the latter instance, the system would further comprise a secondary light source from an 
external laser that is delivered to the chip. 

In yet another aspect, the invention provides a method for analyzing a polymer of 
linked units comprising generating optical radiation of a known wavelength to produce a 
localized radiation spot at a microchannel to define a detection station having a known 

10 detection resolution; labeling the polymer with first and second unit specific markers, the first 
unit specific marker including a first label and the second unit specific marker including a 
second label distinct from the first label, wherein the markers are spaced apart on the polymer 
such that, if the labels were not distinct from each other, they would be separated by a 
distance less than the detection resolution; sequentially exposing the first and second labels to 

15 the localized radiation spot; sequentially detecting radiation of at least two distinct 

wavelength bands resulting from interaction of the first and second labels with the localized 
radiation spot; and analyzing the polymer using the detected wavelength bands. In one 
embodiment, the method further comprises providing the microchannel. 

In one embodiment, the method further comprises applying an electric field to move 

20 the polymer through the microchannel. In another embodiment, the method further comprises 

i 

applying pressure to move the polymer though the microchannel. In yet another embodiment, 
the method further comprises applying suction to move the polymer through the 
microchannel. In another embodiment, the detecting includes collecting the signals over time 
while the unit specific markers are passing through the microchannel. 

25 In one embodiment, the first and second labels are independently selected from the 

group consisting of an electron spin resonance molecule, a fluorescent molecule, a 
chemiluminescent molecule, a radioisotope, an enzyme substrate, an enzyme, a biotin 
molecule, an avidin molecule, an electrical charge transferring molecule, a semiconductor 
nanocrystal, a semiconductor nanoparticle, a colloid gold nanocrystal, a ligand, a microbead, a 

30 magnetic bead, a paramagnetic molecule, a quantum dot, a chromogenic substrate, an affinity 
molecule, a protein, a peptide, a nucleic acid, a carbohydrate, a hapten, an antigen, an 
antibody, an antibody fragment, and a lipid. In an important embodiment, the first and second 
labels are fluorescent labels (i.e., fluorophores). 
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In one embodiment, detecting includes collecting the first and second signals arising 
from the first and second labels while the first and second unit specific markers are moving 
through the microchanneL 

In one embodiment, the first unit specific marker is different from the second unit 
5 specific marker. In another embodiment, the first unit specific marker is identical to the 
second unit specific marker. In an important embodiment, the first and second unit specific 
markers are nucleic acid molecules. In a related embodiment, the first and second unit 
specific markers are peptide nucleic acid molecules or locked nucleic acid molecules. In one 
embodiment, the first and second unit specific markers have an identical nucleotide sequence. 
10 In other embodiments, the first and second unit specific markers have identical binding 

specificities (i.e., they recognize and bind to the same polymer unit (or sequence) with the 
same affinity). It is to be understood that generally only one marker will be bound to one unit 
at a given time. In a related embodiment, the first and second unit specific markers are at 
least 4 bases in length. In another embodiment, the first and second unit specific markers are 
75 less than 12 bases in length. 

In one embodiment, the first unit specific marker and the second unit specific marker 
are positioned immediately adjacent to one another. In another embodiment, first unit specific 
marker and the second unit specific marker are spatially separated from one another by at 
least two units. 

20 In one embodiment, the polymer is labeled with a third unit specific marker, 

preferably comprising a third label. In a related embodiment, the third unit specific marker is 
spaced apart from the first and second unit specific markers by a distance greater than the 
known detection resolution (i.e., the minimum detection resolution). 

In other embodiments, the signals are detected using a detection system of either 
25 electrical or non-electrical nature, such as those listed above. 

In one embodiment, the polymer is a nucleic acid molecule. In some embodiments, 
the polymer is genomic DNA. In certain embodiments, the polymer comprises a backbone 
that includes a label. 

In the various aspects of the invention, the pattern of binding of the unit specific 
30 markers to the polymer, and/or the signals derived from such markers may be determined 

using a variety of systems including a linear polymer analysis system. In some embodiments, 
the linear polymer analysis system is a single polymer analysis system. The nucleic acid 
molecule or the binding of the tag molecule to the nucleic acid molecule can be analyzed 
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using a method selected from the group consisting of Gene Engine™, optical mapping, and 
DNA combing. The Gene Engine™ system is described in published PCT Patent 
Applications WO98/35012, WO00/09757 and WO01/13088, published on August 13, 1998, 
February 24, 2000 and February 22, 2001 respectively, and in U.S. Patent 6,355,420 Bl 
issued on March 12, 2002, all of which are incorporated herein by reference in their entirety. 
Alternatively, the pattern may be determined using fluorescence in situ hybridization (FISH). 
Those of skill in the art will be aware of other systems that can be employed to determine the 
pattern of binding of the unit specific markers to the polymer. 

In still another aspect, the invention provides a method for analyzing a polymer 
comprising labeling a polymer with a set of unit specific markers, wherein each unit specific 
marker of the set of unit specific markers recognizes and binds to units of identical sequence 
within the polymer. Each unit specific marker is labeled with one of at least two distinct 
labels. The method further comprises detecting signals arising from the labels to analyze the 
polymer. The set of unit specific markers can be at least two, at least 3, at least 4, or more 

unit specific markers. 

In one embodiment, about 50% of the unit specific markers are labeled with a first 
label and about 50% of the unit specific markers are labeled with a second label. As used in 
this context, "about 50%" preferably means 35%-65%, 40%-60%, 45%-55%, or 47%-53% 
and 49%-51%. In other embodiments, it preferably includes 35%, 37%, 40%, 42%, 45%, 
47%, 48%, 49%, 50%, 51%, 52%, 53%, 55%, 57%, 60%, 63% and 65%. In another 
embodiment, each unit specific marker is labeled with one of at least three distinct labels. In 
yet another embodiment, each unit specific marker is labeled with one of at least four distinct 
labels. 

In one embodiment, the unit specific markers have identical sequence. In another 
embodiments, the unit specific markers are greater than 4 nucleotides in length or less than 12 
bases in length. 

In other embodiments, the labels are of a type selected from the group consisting of an 
electron spin resonance molecule, a fluorescent molecule, a chemiluminescent molecule, a 
radioisotope, an enzyme substrate, an enzyme, a biotin molecule, an avidin molecule, an 
electrical charge transferring molecule, a semiconductor nanocrystal, a semiconductor 
nanoparticle, a colloid gold nanocrystal, a ligand, a microbead, a magnetic bead, a 
paramagnetic molecule, a quantum dot, a chromogenic substrate, an affinity molecule, a 
protein, a peptide, a nucleic acid, a carbohydrate, a hapten, an antigen, an antibody, an 
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antibody fragment, and a lipid. In still other embodiments, the distinct labels are of different 
types, and optionally, they are detected using different detection systems. 

These and other aspects of the invention will be described in greater detail herein. 

Each of the aspects of the invention can encompass various embodiments of the 
5 invention. It is therefore anticipated that each of the embodiments of the invention involving 
any one element or combinations of elements can be included in each aspect of the invention. 

Brief Description of the Figures 

Figure 1 is a schematic representation illustrating the effect of unit specific markers 

10 conjugated with different labels versus unit specific markers conjugated with identical labels 
on spatial resolution. When both unit specific markers are conjugated with the same label 
(e.g., a green fluorescent molecule), the signal from either cannot be resolved over the other. 
However, when the unit specific markers are conjugated with different labels (e.g., one with a 
green fluorescent molecule and the other with a red fluorescent molecule), the signal from 

15 either can be resolved over the other. In addition, the Figure indicates that the time between 
the distinguishable signal peaks (achievable when different labels are used) is indicative of the 
distance between the position of the unit specific markers on the polymer. 

Figure 2 is a schematic representation illustrating the binding of unit specific markers 
having specificity for different target sequences (i.e., units) on the target polymer being 

20 analyzed. The unit specific markers are conjugated with different labels (e.g., the unit 

specific marker with binding specificity for target sequence A is labeled with a red fluorescent 
molecule and the unit specific marker with binding specificity for target sequence B is labeled 
with a green fluorescent molecule). 

Figure 3 is a schematic representation illustrating the binding of unit specific markers 

25 specific for identical target sequences (i.e., units) on the target polymer being analyzed. The 
unit specific markers are conjugated with identical labels (e.g., both unit specific markers 
have binding specificity for target sequence A and both are labeled with a green fluorescent 
molecule). 

Figure 4 is a schematic representation illustrating the binding of a mixture of identical 
SO unit specific markers having identical binding specificity to target polymers having two 
adjacent target sequences. In this case, 50% of the unit specific markers in the mixture are 
labeled with a green fluorescent molecule, and the remaining 50% are labeled with a red 
fluorescent molecule. Assuming that the particular label has no effect on the binding 
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specificity of the unit specific marker, each unit specific marker can bind to both target 
sequences with equal probability. Accordingly, 25% of target polymers will have bound to 
them two unit specific markers both labeled with a green fluorescent molecule, and 25% of 
target polymers will have bound to them two unit specific markers both labeled with a red 
fluorescent marker. Target polymers that are bound in this way do not provide useful 
information in and of themselves because identical signals are not resolvable at distances 
within the spatial resolution. Information can however be derived from the remaining 50% of 
target polymers which have bound to them unit specific markers that are differentially 
labeled. In half of these latter cases, the target polymer has bound to it sequentially a green 
fluorescent unit specific marker and a red fluorescent unit specific marker. In the remaining 
half of cases (i.e., 25% of the total target polymers), the target polymer has bound to it 
sequentially a red fluorescent unit specific marker and a green fluorescent unit specific 
marker. Differentially labeled unit specific marker can be resolved from each other even if 
they are located closer than the spatial resolution of the system. 

Figure 5 is a schematic representation of the signal outputs from the target polymers 
labeled as in Figure 4. The diagram also indicates the blue signal achieved from a blue 
fluorescent intercalator that binds to the nucleic acid polymer backbone. As indicated in 
Figure 4, target polymers 1 and 2 emit a single indistinguishable (i.e., non-resolvable) signal. 
Target polymers 3 and 4 on the other hand emit slightly overlapping signals which can be 
resolved using the methods of the invention. The ability to resolve the location of the unit 
specific markers (and corresponding units) for target polymers 3 and 4 allows more sequence 
information to be retrieved, especially for unit specific markers (and corresponding units) that 
are located within the spatial resolution limit of prior art methods. 

Figure 6 is a schematic representation of the binding of differentially labeled unit 
specific markers along a polymer having three adjacent target sequences (i.e., units). The 
Figure shows the optimal binding pattern of unit specific markers to be one in which adjacent 
unit specific markers are conjugated to different fluorescent labels. In the situation in which 
the polymer has three adjacent units and half of the identical unit specific markers are labeled 
with a green fluorescent molecule and the remaining half are labeled with a red fluorescent 
molecule, there will be eight possible binding patterns, only two of which (i.e., 25%) will 
yield resolvable sequence information. The two most useful binding patterns are illustrated in 
the Figure. 
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Figure 7 is a schematic representation of data resulting from the passage of lambda 
DNA bound at two units. The lambda DNA passes through two detection regions. The first 
detection region captures the backbone and probe information from the DNA. The second 
detection region captures the backbone information at a fixed distance from the first detection 
5 region. 

Figure 8 is a schematic representation of detection patterns from a polymer (e.g., a 
nucleic acid) that has bound to it two red-labeled unit specific markers, one green-labeled unit 
specifip marker and a blue intercalator along the backbone of the polymer. The Figure 
presents the individual images or spatially defined signals that can be achieved using three 
10 different detector systems. The Figure further illustrates the ability to overlay these individual 
images in order to arrive at a composite image showing the positioning of both labels along 
the length of the polymer. 

It is to be understood that the drawings are not required for enablement of the claimed 
invention. 

15 

Detailed Description of the Invention 

The invention relates to systems and methods for achieving high-resolution linear 
analysis of polymers using differential tagging. Linear analysis of a polymer often requires a 
high-resolution reading of sequence-specific tags. However, the relative spacing of these 

20 sequence-specific tags may be below the resolution of the detection system. In response to 
this limitation, the invention provides a method that enables higher resolution in a given 
detection system by differentially tagging the linear polymer with distinguishable sequence- 
specific tags and capturing the differential signals arising from these tags along the length of 
the polymer. As a result of this differential tagging approach, two or more distinct tags (or as 

25 used herein, unit specific markers) that are in close proximity to each other can be 

distinguished and thus identified as separate, regardless of whether the distance between them 
is below the detection resolution previously achievable using prior art detection systems and 
approaches. This allows the location of unit specific markers (and the units to which they 
correspond) to be mapped with greater positional certainty than was previously possible. 

30 The nucleic acid molecules can be analyzed using linear polymer analysis systems. A 

linear polymer analysis system is a system that analyzes polymers in a linear manner (i.e., 
starting at one location on the polymer and then proceeding linearly in either direction 
therefrom). As a polymer is analyzed, the detectable labels attached to it (either directly or 
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indirectly) are detected in either a sequential or simultaneous manner. When detected 
simultaneously, the signals usually form an image of the polymer, from which distances 
between labels can be determined. When detected sequentially, the signals are viewed in 
histogram form (signal intensity vs. time), that can then be translated into a map, with 
5 knowledge of the velocity of the nucleic acid molecule. It is to be understood that in some 
embodiments, the nucleic acid molecule is attached to a solid support, while in others it is free 
flowing. In either case, the velocity of the nucleic acid molecule as it moves past, for 
example, an interaction station or a detector, will aid in determining the position of the labels, 
relative to each other and relative to other detectable markers that may be present on the 

10 nucleic acid molecule. 

Two general classes of linear analysis, namely fixed molecule and moving molecule 
linear analyses, have been described in that art. Linear analysis of fixed molecules has been 
described in the art and includes methods of fluid-fixing linear molecules such as DNA to 
surfaces and using imaging or scanning-based approaches to collect sequence information. 

1 5 Linear analysis of moving molecules employing either flow or electrophoretic systems has 
been described in the art, as discussed below. 

An example of a linear polymer analysis system is the Gene Engine™ system 
described in PCT patent applications WO98/35012 and WO00/09757, published on August 
13, 1998, and February 24, 2000, respectively, and in issued U.S. Patent 6,355,420 Bl, issued 

20 March 12, 2002. The contents of these applications and patent, as well as those of other 
applications and patents, and references cited herein are incorporated by reference in their 
entirety. This system allows single polymers such as single nucleic acid molecules to be 
passed through an interaction station in a linear manner. In the case of nucleic acid 
molecules, the nucleotides in the nucleic acid molecules are interrogated individually in order 

25 to determine whether there is a detectable label conjugated (e.g., via a unit specific marker) to 
the nucleic acid molecule. The detectable label preferably gives rise to the signal detected. 
Interrogation involves exposing the nucleic acid molecule to an energy source such as optical 
radiation of a set wavelength. In response to the energy source exposure, the detectable label 
emits a detectable signal. The mechanism for signal emission and detection will depend on 

30 the type of label sought to be detected. 

Other single molecule nucleic acid analytical methods which involve elongation of 
DNA molecule can also be used in the methods of the invention. These include optical 
mapping (Schwartz et ah, 1993; Meng et al., 1995; Jing et al., 1998; Aston, 1999) and fiber- 
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fluorescence in situ hybridization (fiber-FISH) (Bensimon et al., 1997). In optical mapping, 
nucleic acid molecules are elongated in a fluid sample and fixed in the elongated 
conformation in a gel or on a surface. Restriction digestions are then performed on the 
elongated and fixed nucleic acid molecules. Ordered restriction maps are then generated by 
5 determining the size of the restriction fragments. In fiber-FISH, nucleic acid molecules are 
elongated and fixed on a surface by molecular combing. Hybridization with fluorescently 
labeled probe sequences allows determination of sequence landmarks on the nucleic acid 
molecules. Both methods require fixation of elongated molecules so that molecular lengths 
and/or distances between markers can be measured. Pulse field gel electrophoresis can also 
10 be used to analyze the labeled nucleic acid molecules. Pulse field gel electrophoresis is 
described by Schwartz et al. (1984). Other nucleic acid analysis systems are described by 
Otobe et al. (2001), Bensimon et al. in U.S. Patent 6,248,537, issued June 19, 2001, Herrick 
and Bensimon (1999), Schwartz in U.S. Patent 6,150,089 issued November 21, 2000 and U.S. 
Patent 6,294,136, issued September 25, 2001 . Other linear polymer analysis systems can also 
15 be used, and the invention is not intended to be limited to solely those listed herein. 

If confocal laser illumination is used in the analysis of amoving molecule (e.g., flow 
analysis of DNA) and the laser is operating in the TEMoo mode, then a Gaussian illumination 
pattern can be achieved and the emission of fluorescence from the probe (i.e., the unit specific 
marker) will vary to a certain extent according to the Gaussian profile of the illumination. 
20 This results in a non-uniform fluorescent signal as the probe traverses the detection spot. The 
fluorescent signal will manifest itself as a peak as the probe passes through the region of 
highest excitation intensity. When the resulting temporal pattern of fluorescence signals is 
examined, the relative location of adjacent probes on the target can be resolved to better than 
the spot size by using the peak output to locate the probe on the target. This is limited 
25 however to probes that are spatially separated sufficiently so that two temporally resolved 
peaks are present in the detected signal. This creates a minimum resolvable spatial probe 
separation, or as referred to herein, the known detection resolution. 

The present invention provides a system that overcomes this limitation in spatial 
separation by analyzing polymers using differentially labeled unit specific markers. The 
30 method involves analyzing a polymer by identifying the presence and/or position of labeled 
unit specific markers bound along its length. Information can be obtained about the structure 
of the polymer including its size, the order of its units (e.g., its sequence), the repetition of its 
units (e.g., its complexity), its relatedness to other polymers, or its presence in a biological 
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sample. For instance, the presence of a marker on a polymer can reveal the identity of the 
polymer. 

One of the important discoveries of the present invention is the finding that the first 
and second unit specific markers can be positioned within the known detection resolution 

5 limit of prior art detection methods and systems. As used herein, the term "known detection 
resolution" refers to the closest distance that two markers having the same label can be 
positioned relative to each other and still be individually detectable and thus resolvable as two 
separate markers, using prior art methods. As will be explained in greater detail below, the 
known detection resolution of prior art fluorescence systems is generally 7J2 (i.e., half the 

JO emitted wavelength of the detectable signal). Thus, for systems in which all the fluorescent 
labels emit at 532 nm for example, the spatial resolution is 532 nm/2 or 266 nm, which 
approximates the distance of 782 base pairs. Accordingly, sequence information could only 
be achieved at intervals of approximately 782 base pairs on average using single color 
detection systems. 

15 Using the systems and methods provided herein, it is possible to spatially resolve first 

and second unit specific markers when these are located at a distance less than the known 
detection resolution. This distance is referred to herein as "below known detection 
resolution". The system detection resolution limits that could be achieved prior to the present 
invention vary with the type of system. As described herein, an optical detection system such 

20 as a fluorescence system has a resolution limit of 7J2 without using the differential tagging 
approach described herein. Accordingly, the below known detection resolution for optical 
detection systems is less than 7J2, where X represents the emission wavelength characteristic 
of a single color system. Figure 1 illustrates the result of using two markers having the same 
label and two probes having different labels on spatial resolution. Known detection 

25 resolutions for other detection modalities are known in the art. 

While in its simplest form the method involves two distinguishable unit specific 
markers, it is possible that additional unit specific markers are used, provided that they too are 
distinguishable from the first and second unit specific markers. The third and subsequent unit 
specific markers may be positioned relative to the first and second unit specific markers such 

SO that the signal produced by the third and subsequent unit specific marker is above the known 
detection resolution with respect to the signals of the first and second unit specific markers. 
As used herein, "above known detection resolution" is greater than A/2, for optical detection 
systems. 
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The methods provided herein are capable of generating signatures for each polymer 
based on the specific interactions between unit specific markers and polymers. A signature is 
the signal pattern that arises along the length of a polymer as a result of the binding of unit 
specific markers (of different or identical sequence) to the polymer. The signature of the 
5 polymer uniquely identifies the polymer. 

One type of analysis embraced by the methods described herein involves analyzing 
patterns of hybridization of two or more unit specific markers to individual polymers. The 
methods of the invention can identify unknown expressed genes by computer analysis of the 
hybridization patterns generated. The data obtained from linear analysis of the DNA unit 
. 10 specific markers are then matched with information in a database to determine the identity of 
the target DNA. The methods can thus analyze information from hybridization reactions, 
which can then be applied to diagnostics and determination of gene expression patterns. 

A "polymer" as used herein is a compound having a linear backbone to which 
monomers are linked together by linkages. The polymer is made up of a plurality of 
15 individual monomers. An individual monomer as used herein is the smallest building block 
that can be linked directly or indirectly to other building blocks or monomers to form a 
polymer. At a minimum, the polymer contains at least two linked monomers. The particular 
type of monomer will depend upon the type of polymer being analyzed. 

The term "backbone" is given its usual meaning in the field of polymer chemistry. 
20 The polymers may be heterogeneous in backbone composition thereby containing any 

possible combination of polymer monomers linked together. In a preferred embodiment the 
polymers are homogeneous in backbone composition and are, for example, nucleic acids, 
polypeptides, polysaccharides, carbohydrates, polyurethanes, polycarbonates, polyureas, 
polyethyleneimines, polyarylene sulfides, polysiloxanes, polyimides, polyacetates, 
25 polyamides, polyesters, or polythioesters. 

As used herein with respect to linked monomers of a polymer, "linked" or "linkage" 
means two entities are bound to one another by any physicochemical means. Any linkage 
known to those of ordinary skill in the art, covalent or non-covalent, is embraced. Such 
linkages are well known to those of ordinary skill in the art Natural linkages, which are those 
30 ordinarily found in nature connecting the individual monomers of a particular polymer, are 
most common. Natural linkages include, for instance, amide, ester and thioester linkages. 
The individual monomers of a polymer may be linked, however, by synthetic or modified 
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linkages. Polymers in which monomers are linked by covalent bonds will be most common. 
The polymer may be branched, but preferably it is linear. 

In preferred embodiments, the polymer is a biological molecule. As used herein, a 
biological molecule is a molecule that is found in, or is functional in, a biological environment 
5 such as a cell. In some embodiments, the polymer is a peptide or a nucleic acid. A "peptide" 
as used herein is a polymer comprised of linked amino acids. A "nucleic acid" as used herein 
is a polymer comprised of linked nucleotides, and includes deoxyribonucleic acid (DNA) or 
ribonucleic acid (RNA). DNA is a polymer comprised of a phosphodiester backbone 
composed of monomers of purines and pyrimidines such as adenine, cytosine, guanine, 
10 thymine, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, 
hypoxanthine, and other naturally and non-naturally occurring nucleobases, substituted and 
unsubstituted aromatic moieties. RNA is a polymer comprised of a phosphodiester backbone 
composed of monomers of purines and pyrimidines such as those described for DNA except 
that uracil is substituted for thymidine. DNA monomers may be linked to each other by their 
15 5' or. 3' hydroxyl group thereby forming an ester linkage. RNA monomers may be linked to 
each other by their 5*, 3* or 2' hydroxyl group thereby forming an ester linkage. Alternatively, 
DNA or RNA monomers having a terminal 5', 3' or 2' amino group may be linked to each 
other by the amino group thereby forming an amide linkage. In some instances, the polymer 
is a peptide nucleic acid (PNA), or a locked nucleic acid (LNA). In some important 
20 embodiments, the unit specific marker is a PNA or a LNA as described below. 

Whenever a nucleic acid is represented by a sequence of letters it will be understood 
that the nucleotides are in 5'-> 3' order from left to right and that "A" denotes adenosine, "C" 
denotes cytosine, "G" denotes guanosine, "T" denotes thymidine, and "U" denotes uracil 

unless otherwise noted. 

25 The nucleic acid molecules used as targets may be DNA (e.g., genomic DNA 

including nuclear and mitochondrial DNA), or RNA, or amplification products or 
intermediates thereof, including complementary DNA (cDNA). The nucleic acid molecules 
can be directly harvested and isolated from a biological sample (such as a tissue or a cell 
culture) without the need for prior amplification using techniques such as polymerase chain 

30 reaction (PCR). In related embodiments, the nucleic acid molecule is a fragment of a 
genomic nucleic acid molecule. 
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The nucleic acid molecules may be single stranded and double stranded nucleic acids. 
Harvest and isolation of nucleic acid molecules are routinely performed in the art and suitable 
methods can be found in standard molecular biology textbooks (e.g., such as Maniatis* 
Handbook of Molecular Biology). 
5 In important embodiments of the invention, the nucleic acid molecule is a non in vitro 

amplified nucleic acid molecule. As used herein, a "non in vitro amplified nucleic acid 
molecule" refers to a nucleic acid molecule that has not been amplified in vitro using 
techniques such as polymerase chain reaction or recombinant DNA methods. A non in vitro 
amplified nucleic acid molecule may however be a nucleic acid molecule that is amplified in 

10 vivo (in the biological sample from which it was harvested) as a natural consequence of the 
development of the cells in vivo. This means that the non in vitro nucleic acid molecule may 
be one which is amplified in vivo as part of locus amplification, which is commonly observed 
in some cell types as a result of mutation or cancer development. 

The size of the nucleic acid molecule is not critical to the invention and it generally 

15 only limited by the detection system used. It can be several nucleotides in length, several 
hundred, several thousand, or several million nucleotides in length. In some embodiments, 
the nucleic acid molecule may be the length of a chromosome. 

Peptide nucleic acids (PNAs) are DNA analogs having their phosphate backbone 
replaced with 2-aminoethyl glycine residues linked to nucleotide bases through glycine amino 

20 nitrogen and methylenecarbonyl linkers. PNAs can bind to both DNA and RNA targets by 
Watson-Crick base pairing, and in so doing form stronger hybrids than would be possible with 
DNA or RNA based markers. Several types of PNA designs exist, and these include but are 
not limited to single strand PNA (ssPNA), bisPNA, pseudocomplementary PNA (pcPNA). 

Peptide nucleic acids (PNA) are synthesized from monomers connected by a peptide 

25 bond (Nielsen and Egholm 1999). These can be built with standard solid phase peptide 

synthesis technology. PNA chemistry and synthesis also allows for inclusion of amino acids 
and polypeptide sequences in the PNA design. For example, lysine residues can be used to 
introduce positive charges in the PNA backbone. All chemical approaches available for the 
modifications of amino acid side chains are directly applicable to PNAs. 

30 Locked nucleic acid (LNA) form hybrids with DNA, which are at least as stable as 

PNA/DNA hybrids (Braasch and Corey 2001). Therefore, LNA can be used just as PNA 
molecules would be. LNA binding efficiency can be increased in some embodiments by 
adding positive charges to the LNA marker. LNAs have been reported to have increased 
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binding affinity inherently. Commercial nucleic acid synthesizers and standard 
phosphoramidite chemistry are used to make LNAs. 

Peptides and polypeptides are polymers comprised of a peptide backbone composed of 
monomers of amino acids, which include the 20 naturally occurring amino acids as well as 
5 modified amino acids. Amino acids may exist as amides or free acids that are linked to each 
other and to the backbone through their cc-amino group thereby forming an amide linkage. 
Amino acid designations as used herein correspond to the triplet or single letter designations 

that are commonly used in the art. 

The polymers may be "native polymers" which are naturally occurring, or 
10 alternatively they may be non-naturally occurring polymers which do not exist in nature. The 
polymers typically include at least a portion of a naturally occurring polymer. The polymers 
can be isolated or synthesized de novo. For example, the polymers can be isolated from 
natural sources e.g. purified, as by cleavage and gel separation or may be synthesized e.g., (i) 
amplified in vitro by, for example, polymerase chain reaction (PCR); (ii) synthesized by, for 
15 example, chemical synthesis; (iii) recombinantly produced by cloning, etc. An example of an 
isolated polymer suitable for analysis using the methods described herein is genomic DNA 
harvested from a cell, tissue or subject. 

The methods of the invention are used to analyze polymers based on markers that 
recognize and bind to units within a polymer. A "unit" of a polymer, as used herein, refers to 
20 a particular linear arrangement of one or preferably more monomers (i.e., a particular defined 
sequence of monomers) within a target polymer. For example, a unit in a nucleic acid 
consists of a particular sequence of nucleotides linked to one another. A nucleic acid unit 
may consist of one, or two nucleotides (i.e., a dinucleotide or a 2-mer), or three nucleotides 
(i.e., a trinucleotide or a 3-mer), or four nucleotides (i.e., a tetranucleotide or a 4-mer), and so 
25 on. The unit may be of any length. As used herein, the polymer being analyzed using the 
methods of the invention is referred to as a "target polymer". 

The units are identified within the polymer by the use of unit specific markers. "Unit 
specific markers" are molecules that specifically recognize and bind to particular units within 
a polymer in a sequence dependent manner. The terms "unit specific marker" and "marker" 
30 are used interchangeably herein. An example of a unit specific marker is a probe (e.g., a 
nucleic acid probe). The method of the invention comprises first labeling a polymer with at 
least two unit specific markers (such as for example, a first and a second unit specific 
marker). As used herein, a polymer that is bound by a unit specific marker is referred to as 
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"labeled" with that unit specific marker. The position of the unit specific marker along the 
length of a target polymer indicates the location of a particular unit in the polymer. If a unit 
specific marker binds to a target polymer under conditions that favor specific binding, this 
indicates that the corresponding unit (and sequence) is present in the polymer. If a unit 
5 specific marker fails to bind to a target polymer under the same conditions, this generally 
indicates that the corresponding unit (and sequence) is not present in the polymer. It is to be 
understood that in the case of nucleic acid molecules, the sequences of the unit specific 
marker and the unit in the target nucleic acid are complementary to each other. 

The unit specific marker may itself be a polymer but it is not so limited. Examples of 

10 suitable polymers are nucleic acid molecules (useful as unit specific markers for target 
polymers that are themselves nucleic acids) and peptides and polypeptides (useful as unit 
specific markers for target polymers that are nucleic acids and peptides). Other unit specific 
markers include but are not limited to sequence specific major and minor groove binders and 
intercalators, peptide binding proteins, nucleic acid binding peptides or polypeptides, and 

15 sequence-specific peptide-nucleic acids, etc. Many unit specific markers exist and are known 
to those of skill in the art. As discussed above, the unit specific marker can also be a PNA or 
a LNA. 

The unit specific marker can be of any length, as can the unit to which it binds. The 
length of the marker will depend upon the particular embodiment The marker length may 
20 range from 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, or more nucleotides (including every integer 
therebetween as if explicitly recited herein). In many embodiments, shorter markers are more 
desirable. In instances in which the polymer and the marker are both nucleic acids, the length 
of the unit and the unit specific marker are generally the same. This is not necessarily so if 
either or both the target polymer or the unit specific marker are not nucleic acids. The method 

+ 

25 embraces the simultaneous use of two or more unit specific markers that may be identical in 
nature or unit binding specificity. For example, the unit specific markers may recognize and 
bind specifically to identical units but they may themselves be different in their composition 
(e.g., one unit specific marker may be a nucleic acid and one may be a peptide). In some 
preferred embodiments, the unit specific markers are identical in their composition regardless 

30 of whether they recognize and bind specifically to identical units. 

As stated above, the unit specific markers themselves may have identical binding 
specificity. Markers with identical binding specificity bind with the same affinity to units 
having the same sequence. Accordingly, these markers will bind with equal probability to 
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their target units in the polymer, provided the label of the marker does not interfere with 
sequence recognition and binding of the marker to the unit. 

In one embodiment of the invention, a set of unit specific markers all of which have ' 
identical binding specificity is used. Preferably the set of markers is divided into as many 
equal parts as possible, with each equal part labeled with a different label. For example, the 
set may be divided into two equal parts, one of which is labeled with a green fluorescent label 
(emitting at about 530 nm) and the other is labeled with a red fluorescent label (emitting at 
about 575 nm). As another example, the set may be divided into three equal parts, one of 
which is labeled with a green fluorescent label, one of which is labeled with a red fluorescent 
label, and the remaining one is labeled with a far-red fluorescent label (emitting at about 630 
nm). The set may include at least 3, at least 4, at least 5, or more unit specific markers 

differentially labeled. 

Alternatively, the unit specific markers may have different binding specificities. As 
used herein, markers with different binding specificities recognize and bind to different 
sequences (i.e., different units) in the target polymer. Unit specific markers recognizing and 
binding to a first unit may be labeled identically or they too may be differentially labeled, 
with the proviso that no single label is used to label markers for different sequences. This 
means that each signal arising from a labeled marker will denote only one unit or sequence 

along the length of the polymer. 

In one important embodiment, the polymer being analyzed is a nucleic acid (i.e., a 
polymer of nucleotides), and the unit specific marker is another nucleic acid having a 
sequence that allows it to hybridize to the target polymer in a sequence specific manner. 
When the target polymer is a nucleic acid, the sequence of the unit specific marker will be 
complementary to the sequence of the unit to which it binds in the target polymer. 

The first unit specific marker and the second unit specific marker may be but need not 
be positioned immediately adjacent (i.e., contiguous) to one another. As used herein, the term 
"positioned immediately adjacent to one another" means that no identical units are located 
between two units or in some instances, that no monomers are located between two units. 
The position of units and markers along a target polymer will depend upon the length of the 
unit and the randomness of sequence distribution in the target molecule. For example, if the 
target unit comprises within its sequence a repetitive sequence (such as a poly-A sequence, an 
Alu repeat, or a CG dinucleotide), then it is more likely that the unit specific markers will be 
positioned relatively close to one another. If however the unit specific marker consists of 6 
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randomly selected nucleotides, then by chance there will be on average approximately 4096 
bases between consecutive units along the length of the polymer. 

The degree to which each target unit is bound by a unit specific marker will also 
depend upon the efficiency of binding (including the binding or hybridization conditions) and 
the concentration of the unit specific marker relative to the concentration of target polymer 
(and target units). If the binding efficiency is low or if the concentration of unit specific 
marker is not saturating, then the first and the second unit specific markers may be spatially 
separated from one another by one, two, three, or more units which will not be bound by unit 
specific markers. 

The ability of the unit specific marker to bind specifically to its target unit will also 
depend upon its length and composition (particularly for markers that are nucleic acids) and 
the conditions under which interaction (i.e., binding) occurs. Unit specific markers will bind 
specifically to a unit of a particular sequence and not to units that differ in sequence from the 
target. If the polymer and the unit specific marker are both nucleic acids, the conditions can 
be manipulated so that only complementary sequences will bind to each other. Persons of 
ordinary skill in the art will know how to achieve and test for such stringent conditions. 
Reference can also be made to Molecular Clo ning: A Laboratory Manual. J. Sambrook, et al., 
eds., Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 
1989 ' or Current Protocols in Molecular Biology. F.M. Ausubel, et al., eds., John Wiley & 
20 Sons, Inc., New York, for guidance in stringent hybridization conditions. If the unit specific 
marker is a peptide or polypeptide, the conditions are similarly manipulated so that only 
specific binding of the marker to a specific unit on the target polymer (which may be nucleic 
acid or peptide in nature itself) will occur. However, in some instances, binding conditions 
may be adjusted to allow the unit specific marker to bind to polymer units that are not 
25 completely complementary. This latter approach is useful when a less than exact sequence of 
the polymer is sought. 

The unit specific markers are resolvable when located relative to each other at a 
distance less than the known detection resolution because they are differentially labeled. As 
used herein, "differentially labeled unit specific markers" are unit specific markers that are 
30 labeled (e.g., conjugated) with different labels that emit different and distinct signals. 

A "label" as used herein is a molecule or compound that can be detected by a variety 
of methods including fluorescence, electrical conductivity, radioactivity, size, and the like. 
The label may be intrinsically capable of emitting a signal, such as for example fluorescent 
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label that emits light of a particular wavelength following excitation by light of another lower, 
characteristic wavelength. Alternatively, the label may not be capable of intrinsically 
emitting a signal but it may be capable of being bound by another compound that does emit a 
signal. An example of this latter situation is a label such as biotin which itself does not emit a 
signal but which when bound to labeled avidin or streptavidin molecules can be detected. 
Other examples of this latter kind of label are ligands that bind specifically to particular 
receptors. Detectably labeled receptors are allowed to bind to ligand labeled unit specific 
markers in order to visualize such markers. Other label types are recited more fully herein. 

The label produces a characteristic signal following interaction with an energy source 
such as a laser beam of a given wavelength (or range of wavelengths), or a current. While it 
is possible that either the target polymer or the unit specific marker are intrinsically labeled, it 
is preferable to use extrinsically labeled unit specific markers in the methods described herein. 
The type of extrinsic label selected will depend on a variety of factors, including the nature of 
the analysis being conducted, the type of the energy source used and the type of polymer. 
Extrinsic label compounds include but are not limited to light emitting compounds, electron 
emitting or absorbing compounds, spin labels, and heavy metal compounds. The label should 
be stericaily and chemically compatible with the units of the polymer being analyzed, and 
with the unit specific markers used. The extrinsic label should not interfere with the binding 
of the unit specific marker to the target polymer, nor should it impact upon the binding 
specificity of the unit specific marker. 

Other labels that may be used according to the invention include but are not limited to 
electron spin resonance molecule, a fluorescent molecule, a chemiluminescent molecule, a 
radioisotope, an enzyme substrate, an enzyme, a biotin molecule, an avidin molecule, an 
electrical charge transferring molecule, a semiconductor nanocrystal, a semiconductor 
nanoparticle, a colloid gold nanocrystal, a ligand, a microbead, a magnetic bead, a 
paramagnetic molecule, a quantum dot, a chromogenic substrate, an affinity molecule, a 
protein, a peptide, nucleic acid, a carbohydrate, a hapten, an antigen, an antibody, an antibody 

fragment, and a lipid. 

Radioisotopes can be detected with film or charge coupled devices (CCDs), ligands 
can be detected by binding of a receptor having a fluorescent, chemiluminescent or enzyme 
tag, and microbeads can be detected using electron or atomic force microscopy. The label can 
be incorporated into the unit specific marker at the time of synthesis or by conjugation 
following synthesis. 
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A "detectable signal" as used herein is any type of signal which can be sensed by 
conventional technology. The signal produced depends on the type of energy source as well 
as the nature of the marker and its label. Preferably the signal is electromagnetic radiation 
resulting from light emission from the labeled unit specific marker bound to the polymer. 
5 The labels bound to unit specific marker may be of the same type, e.g., they may all be 

fluorescent labels, or they may all be radioactive labels, or they may all be nuclear magnetic 
labels. This latter configuration may be preferable in some embodiments. Labels that are of 
the same type are still distinguishable from each other based on the signal they produce once 
in contact with an energy source (such as for example optical radiation). As an example, two 

10 fluorescent labels are distinguishable if they emit fluorescent radiation of different 

wavelengths. Alternatively, the unit specific marker labels may be of a different type, e.g., 
one label may be a fluorescent label and one may be a radioactive label. 

A "light emissive compound" or "light emitting compound" as used herein is a 
compound that emits light in response to irradiation with light of a particular wavelength. 

15 These compounds are capable of absorbing and emitting light through phosphorescence, 

chemiluminescence, luminescence, polarized fluorescence, or, more preferably, fluorescence. 
The particular light emissive compound selected will depend on a variety of factors which are 
discussed in greater detail below. 

Chemiluminescent compounds are compounds which luminesce due to a chemical 
20 reaction. Phosphorescent compounds are compounds which exhibit delayed luminescence as 
a result of the absorption of radiation. Luminescence is a non-thermal emission of 
electromagnetic radiation by a material upon excitation. These compounds are well known in 
the art. 

Generally, fluorescent compounds are hydrocarbon molecules having a chain of 
25 several conjugated double bonds. The absorption and emission wavelengths of a dye are 
approximately proportional to the number of carbon atoms in the conjugated chain. A 
preferred fluorescent compound is "Cy-3" (Biological Detection Systems, Pittsburgh, PA). 
Other preferred fluorescent compounds useful according to the invention include but are not 
limited to fluorescein isothiocyanate ("FITC"), Texas Red™, tetramethylrhodamine 
30 isothiocyanate ("TRITC"), 4, 4-difluoro-4-bora-3a, and 4a-diaza-s-indacene ("BODIPY"), 
Cy-Chrome™, R-phycoerythrin (R-PE), PerCP, allophycocyanin (APC), PharRed™, Mauna 
Blue, Alexa™ 350, and Cascade Blue®. Some light emissive compounds are combinations 
of fluorophores. These compounds are often referred to as "piggyback" fluorophores because 
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they are comprised of two fluorophores in close proximity to each other. In such compounds, 
one of the fluorophores is able to absorb the energy from the laser source, and emits energy 
when returning to the ground state which the other fluorophore can absorb. The resulting 
signal is derived from the second fluorophore upon its return to a less excited state. 
Piggyback compounds expand the fluorescent signals which can be derived from an energy 

source of a single wavelength. 

In one embodiment of the invention the light emissive compound is a donor or an 
acceptor fluorophore. A fluorophore as used herein is a molecule capable of absorbing light 
at one wavelength and emitting light at another wavelength. A donor fluorophore is a 
fluorophore which is capable of transferring its fluorescent energy to an acceptor molecule in 
close proximity. An acceptor fluorophore is a fluorophore that can accept energy from a 
donor at close proximity. (An acceptor does not have to be a fluorophore. It may be non- 
fluorescent.) Fluorophores can be photochemically promoted to an excited state, or higher 
energy level, by irradiating them with light. Excitation wavelengths are generally in the 
ultraviolet, blue, or green regions of the spectrum. The fluorophores remain in the excited 
state for a very short period of time before releasing their energy and returning to the ground 
state. Those fluorophores that dissipate their energy as emitted light are donor fluorophores. 
The wavelength distribution of the outgoing photons forms the emission spectrum, which 
peaks at longer wavelengths (lower energies) than the excitation spectrum, but is equally 
characteristic for a particular fluorophore'. 

Table 1 indicates the various types of light emissive compounds available, along with 
their characteristic absorption and emission spectra and lasers that are suitable for their 
excitation. Fluorescently conjugated nucleotides, such as Cy3 and Cy5 labeled thymidine and 
cytosine, are commercially available from Amersham Pharmacia Biotech. Single labeled 
nucleotides are used in a standard automated nucleic acid synthesis along with non-labeled 
versions of the remaining three nucleotides. Depending upon the nucleotide content of the 
unit specific marker being synthesized (i.e., the nucleic acid probe), it may be necessary to 
include both labeled and unlabeled versions of the same nucleotide in a given synthesis 
reaction, in order to equalize the fluorescence from different unit specific markers. 

Although most fluorophores exhibit a peak wavelength of emission, their emission 
spectra also span a range of wavelengths, resulting in the possibility that one fluorophore may 
emit into the detection channel of another fluorophore. In order to reduce the overlap in 
fluorescence between fluorophores, the signal from each into the detector of another is 
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attenuated by compensation. This technique is known and routinely practiced in the art of 
flow cytometry. Briefly, a proportion of the signal from each fluorophore into its intended 
detector is subtracted from the signal the same fluorophore emits into the detector of another 
fluorophore. Compensation should be performed when using combinations of fluorophores 
having broad, overlapping emission spectra. 


Table 1 


Compound 

Absorption 
Wavelength (nm) 

Emission 
Wavelength (nm) 

Laser Tvne and 
Wavelensth fnnri 

Marina Blue 

360 

460 


Alexa™ 350 

360 

445 


Cascade Blue® 

408 

430 

405 nm diode 

Cascade Yellow 

408 

510 

405 nm diode 

Flourescein (FITC) 

488 

525 

488 nm Argon 

Phycoerythrin (R- 
PE) 

488 

575 

488 nm Argon 

Cy-Chrome™ (Cy- 
5) 

488 

670 

488 nm Argon 

PerCP™ 

488 

675 

488 nm Argon 

Texas Red® 

595 

610 

Argon-Krypton 
or Dye 

APC 

595 

660 

Helium-Neon or 
Krypton 

PharRed™ 
(Cy7-APC) 

595 or 633 

780 

Helium-Neon 

BODIPY 




Rhodamine 
(TRTTC) 

544 

572 

532 nm or 543 nm 


Radioactive compounds are substances which emit alpha, beta or gamma nuclear 
10 radiation. Alpha rays are positively charged particles of mass number 4 and slightly deflected 
by electrical and magnetic fields. Beta rays are negatively charged electrons and are strongly 
deflected by electrical and magnetic fields. Gamma rays are photons of electromagnetic 
radiation and are undeflected by electrical and magnetic fields and are of wavelength of the 
order of 10~ 8 to 10" 9 cm. The radioactive compound emits nuclear radiation as it passes the 
15 station. When the station is a scintillation layer, the nuclear radiation interacts with the 
scintillation layer and causes fluorescent excitation. A fluorescent signal indicative of the 
radioactively labeled marker can then be detected. 

The unit specific markers and/or polymers can be labeled using antibodies or antibody 
fragments and their corresponding antigen or hapten binding partners. Detection of such 
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bound antibodies and proteins or peptides is accomplished by techniques well known to those 
skilled in the art. Use of hapten conjugates such as digoxigenin or dinitrophenyl is also well 
suited herein. Antibody/antigen complexes which form in response to hapten conjugates are 
easily detected by linking a label to the hapten or to antibodies which recognize the hapten 
and then observing the site of the label. Alternatively, the antibodies can be visualized using 
secondary antibodies or fragments thereof that are specific for the primary antibody used. 
Polyclonal and monoclonal antibodies may be used. Antibody fragments include Fab, F(ab) 2 , 
Fd and antibody fragments which include a CDR3 region. The conjugates can also be labeled 
using dual specificity antibodies. 

In still another embodiment, the polymer is labeled with a sequence independent label, 
including backbone labels. If the polymer is a nucleic acid, the sequence independent label is 
referred to as a nucleic acid stain. Nucleic acid stains can be intercalating dyes such as 
phenanthridines and acridities (e.g., ethidium bromide, propidium iodide, hexidium iodide, 
dihydroethidium, ethidium homodimer-1 and -2, ethidium monoazide, and ACMA); minor 
grove binders such as indoles and imidazoles (e.g., Hoechst 33258, Hoechst 33342, Hoechst 
34580 and DAPI); and miscellaneous nucleic acid stains such as acridine orange (also capable 
of intercalating), 7-AAD, actinomycin D, LDS751, and hydroxystilbamidine. All of the 
aforementioned nucleic acid stains are commercially available from suppliers such as 
Molecular Probes, Inc. Still other examples of nucleic acid stains include the following dyes 
from Molecular Probes: cyanine dyes such as SYTOX Blue, SYTOX Green, SYTOX Orange, 
POPOl, POPO-3, YOYO-1, YOYO-3, TOTO-1, TOTO-3, JOJO-1, LOLO-1, BOBO-1, 
BOBO-3, PO-PRO-1, PO-PRO-3, BO-PRO-1, BO-PRO-3, TO-PRO-1, TO-PRO-3, TO- 
PRO-5, JO-PRO-1, LO-PRO-1, YO-PRO-1, YO-PRO-3, PicoGreen, OliGreen, RiboGreen, 
SYBR Gold, SYBR Green I, SYBR Green II, SYBR DX, SYTO-40, -41, -42, -43, -44, -45 
(blue), SYTO-13, -16, -24, -21, -23, -12, -11, -20, -22, -15, -14, -25 (green), SYTO-8 1, -80, - 
82, -83, -84, -85 (orange), SYTO-64, -17, -59, -61, -62, -60, -63 (red). 

The unit specific marker and the extrinsic label are conjugated or linked to each other. 
Extrinsic labels can be linked or conjugated to the unit specific marker by any means known 
in the art. For example, the labels may be attached directly to the unit specific marker or 
attached to a linker which is attached to the unit specific marker. Unit specific markers can be 
chemically derivatized to include linkers or to facilitate binding to linkers in order to enhance 
this process. For instance, fluorophores have been directly incorporated into nucleic acids by 
chemical means but have also been introduced into nucleic acids through active amino or thiol 
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groups in on introduced into nucleic acids. (Proudnikov and Mirabekov, Nucleic Acid 
Research, 24:4535-4532, 1996.) An extensive description of modification procedures that can 
be performed on the marker, the linker and/or the label can be found in Hermanson, G.T., 
Bioconjugate Techniques, Academic Press, Inc., San Diego, 1996, which is hereby 
incorporated by reference. 

There are several known methods of direct chemical labeling of DNA (Hermanson, 
1996; Roget et al., 1989; Proudnikov and Mirabekov, 1996). One of the methods is based on 
the introduction of aldehyde groups by partial depurination of DNA. Fluorescent labels with 
an attached hydrazine group are efficiently coupled with the aldehyde groups and the 
hydrazine bonds are stabilized by reduction with sodium labeling efficiencies around 60%. 
The reaction of cytosine with bisulfite in the presence of an excess of an amine fluorophore 
leads to transamination at the N4 position (Hermanson, 1996). Reaction conditions such as 
pH, amine fluorophore concentration, and incubation time and temperature affect the yield of 
products formed. At high concentrations of the amine fluorophore (3M), transamination can 
approach 100% (Draper and Gold, 1980). 

In addition to the above method, it is also possible to synthesize nucleic acids de novo 
(e.g., using automated nucleic acid synthesizers) using fluorescently labeled nucleotides. 
Such nucleotides are commercially available from suppliers such as Amersham Pharmacia 
Biotech, Molecular Probes, and New England Nuclear/Perkin Elmer. 

Light emissive compounds can be attached to unit specific markers or by any 
mechanism known in the art. For instance, functional groups which are reactive with various 
light emissive groups include, but are not limited to, (functional group: reactive group of light 
emissive compound) activated estenamines or anilines; acyl azide:amines or anilines; acyl 
halideramines, anilines, alcohols or phenols; acyl nitrileralcohols or phenols; aldehydeiamines 
or anilines; alkyl halide:amines, anilines, alcohols, phenols or thiols; alkyl sulfonaterthiols, 
alcohols or phenols; anhydride: alcohols, phenols, amines or anilines; aryl halide:thiols; 
aziridine:thioIs or thioethers; carboxylic acid:amines, anilines, alcohols or alkyl halides; 
diazoalkane:carboxylic acids; epoxideithiols; haloacetamiderthiols; halotriazine:amines, 
anilines or phenols; hydrazine.aldehydes or ketones; hydroxyamine:aldehydes or ketones; 
imido estenamines or anilines; isocyanate:amines or anilines; and isothiocyanate:amines or 
anilines. 

i 

The labeled polymer is exposed to an energy source in order to generate a signal from 
the label. As used herein, the labeled polymer is "exposed" to an energy source by 


0302564QA2 I > 


WO 03/025540 PCT/US02/29687 

-28- 

positioning or presenting the labeled unit specific marker bound to the polymer in interactive 
proximity to the energy source such that energy transfer can occur from the energy source to 
the labeled unit specific marker, thereby producing a detectable signal. Interactive proximity 
means close enough to permit the interaction or change which yields that detectable signal. 

The energy source may be selected from the group consisting of electromagnetic 
radiation, and a fluorescence excitation source, but is not so limited. "Electromagnetic 
radiation" as used herein is energy produced by electromagnetic waves. Electromagnetic 
radiation may be in the form of a direct light source or it may be emitted by a light emissive 
compound such as a donor fluorophore. "Light" as used herein includes electromagnetic 
energy of any wavelength including visible, infrared and ultraviolet. A fluorescence 
excitation source as used herein is any entity capable of making a source fluoresce or give rise 
to photonic emissions (i.e. electromagnetic radiation, directed electric field, temperature, 
physical contact, or mechanical disruption.) 

In one aspect, the method further involves exposing the labeled polymer to a station to 
produce distinct signals arising from the labels of the unit specific markers. As used herein, a 
labeled polymer is "exposed" to a station by positioning or presenting the labeled unit specific 
marker bound to the polymer in interactive proximity to the station such that energy transfer 
or a physical change in the station can occur, thereby producing a detectable signal. A 
"station" as used herein is a region where a portion of the polymer (having a labeled unit 
specific marker bound thereto) is exposed to an energy source in order to produce a signal or 
polymer dependent impulse. The station may be composed of any material including a gas, 
but preferably the station is a non-liquid material. In one preferred embodiment, the station is 
a composed of a solid material. If the labeled unit specific marker interacts with the energy 
source at the station, then it is referred to as an interaction station. An "interaction station" is 
a region where a labeled unit specific marker and the energy source can be positioned in close 
enough proximity to each other to facilitate their interaction. The interaction station for 
fluorophores is that region where the labeled unit specific marker and the energy source are 
close enough to each other that they can energetically interact to produce a signal. 

When the labeled unit specific markers are sequentially exposed to the station and/or 
the energy source, the marker (and thus polymer) and the station and/or the energy source 
move relative to each other. As used herein, when the marker and the station and/or energy 
source move relative to each other, this means that either the marker (and thus polymer) or the 
station and/or the energy source are both moving, or alternatively only one of the two is 
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moving and other is stationary. Movement between the two can be accomplished by any 
means known in the art. As an example, the marker and polymer can be drawn past a 
stationary station by an electric current. Other methods for moving the marker and polymer 
past the station include but are not limited to magnetic fields, mechanical forces, flowing 
5 liquid medium, pressure systems, suction systems, gravitational forces, and molecular motors 
(e.g., DNA polymerases or helicases if the polymer is a nucleic acid, and myosin when the 
polymer is a peptide such as actin). Polymer movement can be facilitated by use of channels, 
grooves, or rings to guide the polymer. The station is constructed to sequentially receive the 
target polymer (with labeled unit specific markers bound thereto) and to allow the interaction 
10 of the label and the energy source. 

The interaction station in a preferred embodiment is a region of a nanochannel where a 
localized energy source can interact with a polymer passing through the channel. The point 
where the polymer passes the localized region of agent is the interaction station. As each 
labeled unit specific marker passes by the energy source a detectable signal is generated. The 
15 energy source may be a light source which is positioned a distance from the channel but 

which is capable of transporting light directly to a region of the channel through a waveguide. 
An apparatus may also be used in which multiple polymers are transported through multiple 
channels. The movement of the polymer may be assisted by the use of a groove or ring to 
guide the polymer. 

20 Other arrangements for creating interaction stations are embraced by the invention. 

For example, a polymer can be passed through a molecular motor tethered to the surface of a 
wall or embedded in a wall, thereby bringing units of the polymer sequentially to a specific 
location, preferably in interactive proximity to the energy source, thereby defining an 
interaction station. A molecular motor is a compound such as polymerase or helicase which 

25 interacts with the polymer and is transported along the length of the polymer past each unit. 
Likewise, the polymer can be held stationary and a reader can be moved along the polymer, 
the reader having attached to it the energy source. For instance the energy source may be held 
within a scanning tip that is guided along the length of the polymer. Interaction stations then 
are created as the energy source is moved into interactive proximity to each labeled unit 

30 specific marker. 

As discussed earlier many methods may be used to move the polymer linearly across 
the channel and past the interaction station or signal generation station. A preferred method 
according to the invention utilizes and electric field. An electric field can be used to pull a 
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polymer through a channel because the polymer becomes stretched and aligned in the 
direction of the applied field as has previously been demonstrated in several studies 
(Bustamante, 1991; Gurrieri et ah, 1990; Matsumoto et al., 1981). The most related 
experiments regarding linear crossing of polymers through channels arise from experiments in 
which polymeric molecules are pulled through protein channels with electric fields as 
described in Kasianowicz et ai., 1996 and Bezrukov et aL, 1994, each of which is hereby 
incorporated by reference. 

In order to achieve optimal linear crossing of a polymer across a channel it is 
important to consider the channel diameter as well as the method used to direct the linear 
crossing of the polymer e.g., an electric field. The diameter of the channels should 
correspond well with that of the labeled polymer. The theory for linear crossing is that the 
diameter of the channels correspond well with that of the polymer. For example the ring-like 
sliding clamps of DNA polymerases have internal diameters that correspond well with the 
diameter of double-stranded DNA and are successful at achieving linear crossing of a DNA 
molecule. Many kilobases of DNA can be threaded through the sliding clamps. Several 
references also have demonstrated that linear crossing of DNA through channels occurs when 
the diameter of the channels corresponds well with that of the diameter of the DNA. 
(Bustamante, 1991; Gurrieri et al., 1990; Matsumoto et aL, 1981). 

Single-stranded DNA, as used in the experiment, has a diameter of ~1.6-nm. A 
channel having an internal diameter of approximately 1.7 - 3 nm is sufficient to allow linear 
crossing of a single strand DNA molecule. The diameters of the channel and the DNA need 
not match exactly but it is preferred that they be similar. For double-stranded DNA which has 
a diameter of 3.4-nm, channel sizes between 3.5-nm and 4.5-nm are sufficient to allow linear 
crossing. 

The interaction station uses unique arrangements and geometries that allow the 
localized radiation spot to interact with one or several polymer units or unit specific marker 
labels that are on the order of nanometers or smaller. Optical detector detects light modified 
by the interaction and provides a detection signal to the processor. 

As the labeled polymer passes through interaction station, the optical source emits 
radiation electric or electromagnetic field, X-ray radiation, or visible or infrared radiation for 
characterizing the polymer passing through the interaction station directed to an optical 
component of interaction station. The optical component produces a localized radiation spot 
that interacts directly with a) the polymer backbone (e.g., when the polymer backbone is 
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bound to an intercalator that emits radiation), b) labels attached to the unit specific markers, or 
c) both the backbone units and the labels. The localized radiation spot includes non-radiating 
near field or an evanescent wave, localized in at least one dimension. The localized radiation 
spot provides a much higher resolution than the diffraction-limited resolution used in 
5 conventional optics. 

The interaction between the labeled unit specific marker and the agent can take a 
variety of forms. As a first example, the interaction can take place between an energy source 
that is electromagnetic radiation and a labeled unit specific marker that is a light emissive 
compound (preferably, a unit specific marker that is extrinsically labeled with a light emissive 

10 compound). When the light emissive compound is exposed to the electromagnetic radiation 
(such as by a laser beam of a suitable wavelength or electromagnetic radiation emitted from a 
donor fluorophore), the electromagnetic radiation causes the light emissive compound to emit 
electromagnetic radiation of a specific wavelength. A second type of interaction involves an 
energy source that is a fluorescence excitation source and a unit specific marker that is labeled 

15 with a light emissive compound. When the light emissive unit is contacted with the 

fluorescence excitation source, the fluorescence excitation source causes the light emissive 
compound to emit electromagnetic radiation of a specific wavelength. In both examples, the 
signal that is measured exhibits a characteristic pattern of light emission, indicating that a 

* 

particular unit of the polymer is present at that particular location. 

20 A variation of these types of interaction involves the presence of a third element of the 

interaction, a proximate compound which is involved in generating the signal. For example, a 
unit specific marker may be labeled with a light emissive compound which is a donor 
fluorophore and a proximate compound can be an acceptor fluorophore. If the light emissive 
compound is placed in an excited state and brought proximate to the acceptor fluorophore, 

25 then energy transfer will occur between the donor and acceptor, generating a signal which can 
be detected as a measure of the presence of the unit specific marker which is light emissive. 
The light emissive compound can be placed in the "excited" state by exposing it to light (such 
as a laser beam) or by exposing it to a fluorescence excitation source. 

A set of interactions parallel to those described above can be created in which the light 

30 emissive compound is the proximate compound and the labeled unit specific marker is an 

acceptor source. In these instances the energy source is electromagnetic radiation emitted by 
the proximate compound, and the signal is generated by bringing the labeled unit specific 
marker in interactive proximity with the proximate compound. 


BNSDOCID: <WO 


0302554QA2 I > 


WO 03/025540 PCT/USO 2/2968 7 

-32- 

The mechanisms by which each of these interactions produce detectable signals are 
known in the art. PCT applications WO98/35012 and WO00/09757, published on August 13, 
1998 and February 24, 2000 respectively, and U.S. Patent 6,355,420 Bl issued March 12, 
2002, describe the mechanism by which a donor and acceptor fluorophore interact according 
5 to the invention to produce a detectable signal including practical limitations which are known 
to result from this type of interaction and methods of reducing or eliminating such limitations. 

In some embodiments, the system also provides for polymer alignment The polymer 
alignment station and the interaction station include a substrate, a quartz wafer, and a glass 
cover, which is optional. Substrate is machined from a non-conducting, chemically inert 
10 material, such as Teflon® or Delrin®, to facilitate a flow of conducting fluid (for example, 
agarose gel) and the target polymer. Substrate includes trenches machined to receive gold 
wires, which have a selected shape in accordance with the shape of the electric field used for 
advancing polymer molecules across the interaction station. The quartz wafer is sealed onto 
substrate. 

15 Alternatively, the trenches and wires may be replaced by metallic regions located 

directly on the quartz wafer, or may be replaced by external electrodes for creating the electric 
field. In general, the electrodes are spaced apart over a distance in the range of about 
millimeter to 5 centimeters, and preferably 2 centimeters and provide typically field strengths 
of about 20 V/cm. 

20 The alignment station and the interaction station may be fabricated together on a 

quartz wafer. Of course, a single quartz wafer may include hundreds or thousands of the 
alignment and interaction stations. In some important embodiments, the quartz wafer 
includes a quartz substrate covered with a metal layer (e.g., aluminum, gold, silver) and 
having a microchannel fabricated on the surface. Fabricated through the metal layer are slits 

25 that form the optical elements that provide the localized radiation spot. These slits have a 

selected width in the range between 1 nm and 5000 nm, and preferably in the range between 1 
nm and 500 nm, and more preferably in the range between 10 nm and 100 nm. The slits are 
located across from the microchannel having a width in the range of 1 micrometer to 50 
micrometers and a length of several hundred micrometers. The electric field, created by the 

30 gold wires, pulls a polymer (such as a DNA molecule) through the microchannel past the slits. 

The polymer alignment station includes several alignment posts in several regions that 
are connected via transition regions to microchannel. The alignment posts have a circular 
cross-section, are about 1 micron in diameter, are spaced about 1.5 microns apart and located 
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about 5 urn to 500 urn (and preferably about 10 pm to 200 um) from the microchannel 
depending on the length of the examined polymer. For example, when the polymer is 
bacteriophage T4 DNA, which has about 167 000 base pairs, the alignment posts are located 
about 30 Mm from the nanoslit. In general, the distance from the nanoslit is about one half of 
the expected length of polymer. 

In a preferred embodiment, the polymers are aligned and stretched before they reach 
the interaction station. The alignment station includes a triangular microchannel, a micropost 
region, and an entrance region, all fabricated on the surface. 

The entrance region is about 50 micron wide and is in communication with the 
micropost region. The micropost region includes several alignment posts. The alignment 
posts have a circular cross-section and are about 1 micron in diameter. The alignment 
microposts are spaced about 1.5 microns apart in 12 to 15 rows. The micropost region is 
canted at about 26.6 degrees. 

The microposts are located about 100 pm to 5,000 urn (and preferably about 1,000 pm 
to 3,000 pm) from the interaction station, where the units of die polymer (e.g. DNA) interact 
with optical radiation. The microchannel is a region of constant x-direction shear that 
maintains the polymer in extended conformation after release from the microposts. The 
electric field pulls the examined polymer through the microchannel. 

A very effective technique of stretching a polymer (e.g., DNA) uniformly is to have an 
obstacle field inside the tapered microchannel, followed by a constant-shear section to 
maintain the stretching obtained and straighten out any remaining coiling in the polymer. The 
preferred embodiment is a structure that combines microposts with two regions of different 
funnel designs. Pressure flow is the preferred driving force because of the predictable 
behavior of fluid bulk flow. 

The light beam, emitted from the optical source interacts with the nanoslit formed in 
the metal layer, to produce a localized radiation spot The laser beam, which has a diameter 
many times larger that the width of nanoslit, irradiates the back side of the quartz wafer, 
propagates through the quartz wafer and interacts with the nanoslit. The localized radiation 
spot, which is a non-radiating near field, irradiates sequentially the units of the polymer chain 
as it is pulled through the microchannel. The localized radiation spot may be understood as 
an evanescent wave emitted from the nanoslit. Because the width of the nanoslit is smaller 
than the wavelength of the light beam, the radiation is in the Fresnel mode. 
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The optical system may also include a polarizer placed between the optical source and 
the quartz wafer, and a notch filter, placed between the quartz wafer and the optical detector. 
When the polarizer orients the light beam with the E vector parallel to the length of the 
nanoslit, there is near-field radiation emitted from the nanoslit and no far-field radiation. 
5 When the polarizer orients the light beam with the E vector perpendicular to nanoslit (which 
is many wavelengths long), there is far-field emission from the nanoslit. By selectively 
polarizing the incident beam, the optical system can switch between the near-field and 
far-field emissions. 

In a representative system in which unit specific markers are detected by using unit 

10 specific markers labeled with a fluorophore, the optical system includes a laser source, an 

acousto-optic tunable filter, a polarizer, a notch filter, an intensifier and a CCD detector, and a 
video monitor connected to a video recorder (VCR). The individual units with the target 
polymer are detected via the unit specific marker that is selectively labeled with a fluorophore 
sensitive to a selected excitation wavelength. The Acousto-optic tunable filter is used to 

15 select the excitation wavelength of the light emitted from the laser source. The excitation 
beam interacts with the nanoslit to create the non-radiating near-field. The electric field 
between the gold wires pulls the polymer at a known rate causing interaction of each labeled 
unit with radiation. As the fluorophore moves pass the slits, emitted radiation excites the 
fluorophore and it, in turn, re-emits fluorescent radiation. The Notch filter allows the 

20 radiation having the fluorescent wavelength to pass and attenuates the radiation having the 
excitation wavelength. This serves to increase the signal to noise resolution, and this latter 
use of optical filters is known in the art. The charged coupled device (CCD) detector located 
a few millimeters to a few centimeters above the quartz wafer detects the fluorescent 
radiation. The CCD detector can detect fluorescent radiation separately for each of the many 

25 nanoslits as the fluorophore moves across them. This process can potentially occur at a large 
number of nanoslits located on the quartz wafer. 

The electric field may be used to position the target polymer close to the nanoslit(s). 
The nanoslit "emits" the non-radiating field, which is attenuated over a distance of only one or 
two wavelengths. To position the fluorophore within the range of the non-radiating field, it 

SO may be necessary to pull the polymer closer to the nanoslit (and the metal film) and thus 

closer to the metal layer. The polymer is pulled closer to the nanoslit using dielectric forces 
created by applying alternating current (AC) field to the metal layer. See, e.g., 'Trapping of 
DNA in Nonuniform Oscillating electric Fields," by Charles L. Ashbury and Ger van den 
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Engh, Biophysical Journal Vol 74, pp 1024-1030 (1998), "Molecular ^electrophoresis of 
Biopolymers," by M. Washizu, S. Suzuki, O. Kurosawa, T. Nishizaka, and T. Shinohara, in 
IEEE Transactions on Industry Applications, Vol 30, No 4, pp. 835-843 (1994), and 
"Electrostatic Manipulation of DNA in Microfabricated Structures," by M. Washizu, and O. 
5 Kurosawa, in IEEE Transactions on Industry Applications, Vol 26, No 6, pp. 1 165-1 172 
(1990). In general, see "Dielectrophoresis: The Behavior of Neutral Matter in Nonuniform 
Electric Fields," by Pohl, H. A., Cambridge University Press, Cambridge, UK, 1978. The 
inhomogeneous field will attract polarized units of polymer (e.g., DNA molecule) to the metal 
layer. 

10 The system can optionally contain a second interaction station that can measure ionic 

current across a nanochannel as linearized polymer molecules approach the nanochannel and 
pass through. The detected blockages of the ionic current can be used to characterize the 
length of the polymer molecules as well as other polymer characteristics. The Interaction 
station can apply a transchannel voltage using electrodes in a direction perpendicular to paired 

15 electrodes to draw the polymer molecules through a channel. The paired electrodes are 
connected to a microampere meter, located in a controller, and this arrangement serves to 
measure the ionic current across the nanochannel. Alternatively, the microampere meter is 
replaced by a bridge, which compares the impedance of the channel in the absence (Zi) and in 
the presence (Z*) of the polymer. When the polymer is absent from the channel, the voltmeter 

20 measures 0 V. As the extended, nearly linear polymer passes through the channel, its 
presence detectably reduces, or completely blocks, the normal ionic flow between the 
electrodes. 

The paired electrodes are fabricated using submicron lithography and are connected 
either to the bridge (to detect changes in the impedance) or to the microampere meter (to 

25 measure the ionic current). The measured data across the channel are amplified, and the 

amplified signal is filtered (e.g., 64,000 samples per second) using a low pass filter, and the 
data is digitized at a selected sampling rate by an analog-to-digital converter. The System 
controller or processor correlates the transient decrease in the ionic current with the speed of 
the polymer units and determines the length of the polymer, for example the length of a DNA 

30 or RNA molecule. 

The fabrication of the alignment region, the microchannel and the slits has been 
described before in Published PCT Application No. WO 00/09757. 
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An optical system for detecting near field and far field radiation emitted from the 
nanochannel can also be used. In this system, the optical source emits the light beam, which 
is focused onto the input side of the waveguide using techniques described below. After the 
interaction of the evanescent waves with the polymer, the near field radiation is collected by 
the waveguide and optically coupled to the optical detector from the output side. The far- 
field, is collected by a lens, filtered by a tunable filter and provided to a PMT detector. An 
optical source, such as an LED or a laser diode may be incorporated onto the quartz wafer. 
This arrangement would eliminate the need for an external optical source which has to be 
aligned with an input side. The optical sources are made using a direct bandgap material, for 
example GaN for generating UV radiation, or GaP:N for generating radiation of a green 
wavelength. 

A quartz wafer may also include an integrated optical detector in order to avoid 
external setup for detection and filtering. An integrated avalanche photodiode or a PIN 
photodiode, together with an in situ filter for filtering out the excitation wavelength, receives 
the light beam. Various integrated optical elements are described in "Integrated 
Optoelectronics - Waveguide Optics, Photonics, Semiconductors," by Karl Joachim Ebeling, 
Springer- Verlag, 1992. For example, a corrugated waveguide is used as a contradirectional 
coupler so that light within a narrow frequency band will be reflected back resulting in a 
filtering action. Another filter is made using two waveguides with different dispersion 
relations in close proximity. Light from one waveguide will be coupled into the other for 
wavelengths for which there is a match in the index of refraction. By applying a voltage to 
the waveguides, the dispersion curve is shifted and the spectrum of the resulting filter is 

altered providing a tunable filter. 

In another embodiment, the optical system uses radiation modulated at frequencies in 

the range of 10MHz to 1GHz as described above. 

Different types of coupling of light from an external optical source into a waveguide 
can be used. For example, the lights source emits a light beam, which is focused onto the 
input side of a triangular waveguide using a focusing lens. Alternatively, a prism is used to 
couple a light beam into a triangular waveguide. A light beam is diffracted by a prism and 
undergoes total internal reflection inside the prism. The prism is located on the surface of a 
Si0 2 volume and is arranged to optically couple a beam across a layer into a waveguide. 
Alternatively, a diffraction grating is used to couple a light beam into a triangular waveguide. 
A grating is fabricated on a waveguide so that it diffracts the light beam 176 toward the tip. 
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Alternatively, an optical fiber couples a light beam to a triangular waveguide. Different ways 
to couple light into a waveguide are described in Fundamentals of Optics, by Clifford R 
Pollock, Richard D. Irwin, Inc., 1995. 

Waveguide fabrication is described in Published PCT Patent Application 
PCT/US99/18438, filed on August 13, 1999. 

Another embodiment of the present invention utilizes confocal fluorescence 
illumination and detection. Confocal illumination allows a small optical volume (on the order 
of picoliters) to be illuminated. Both Raleigh and Raman scattering are minimized using a 
small probe volume. The optical apparatus includes a light source, a filter, a dichroic mirror, 
an objective, a narrow band pass filter, a pinhole, a lens, and a detector. The light source, 
which is a 1 mW argon ion laser, emits a laser beam, which passes through a filter. The filter 
is a laser line filter that provides a focused beam of a wavelength of about 514 nm. The 
filtered beam is reflected by a dichroic mirror and is focussed by an objective onto a region of 
a polymer such as a DNA molecule. The objective is a lOOx 1.2 NA oil immersion objective. 

The excited tag provides a fluorescence emission that is passed through a dichroic 
mirror and, a narrow bandpass filter (e.g., manufactured by Omega Optical) and is focused 
onto a 100 urn pinhole. The fluorescent light is focussed by an aspheric lens onto the 
detector, which is an avalanche photodiode (e.g., manufactured by EG&G Canada) operating 
in the photon counting mode. The output signal from the photodiode is collected by a 
multichannel scalar (EG&G) and analyzed using a general purpose computer. 

The confocal apparatus is appropriate for quantitative applications involving time-of- 
flight. Such applications include measuring distances on the DNA detecting tagged 
sequences, and determining degrees of stretching in the DNA. Single fluorescent molecules 
can be detected using the apparatus. Alternatively, an imaging apparatus uses an intensified 
25 CCD (ICCD, Princeton Instruments) mounted on a microscope. 

According to the methods described herein, each analysis intends to capture or detect 
preferably two or more detectable signals. As described herein, a first unit specific marker 
can interact with the energy source to produce a first signal and a second unit specific marker 
can interact with the energy source to produce a second signal. The signals so produced are 
different and thus distinct from one another. Distinct signals as used herein refer to signals 
which can be differentiated from one another. This enables more than one type of unit to be 
detected on a single target polymer. This also enables units more thorough sequencing of a 
target polymer since units located at distances smaller than the resolution limit of prior art 
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approaches can now to detected separately and their positions can be distinguished and thus 
mapped along the length of the polymer. 

Once the signal is generated it can then be detected. The particular type of detection 
means will depend on the type of signal generated which of course will depend on the type of 

5 interaction which occurs between the unit and the energy source. Most of the interactions 
involved in the method will produce an electromagnetic radiation signal. Many methods are 
known in the art for detecting electromagnetic radiation signals. Preferred devices for 
detecting signals are two-dimensional imaging systems that have, among other parameters, 
low noise, high quantum efficiency, proper pixel-to-image correlation, and efficient 

10 processing times. An example of a device useful for detecting signals is a two-dimensional 
fluorescence imaging system which detects electromagnetic radiation in the fluorescent 
wavelength range. 

The detectable signals can be distinguished from each other by using multiple 
detectors each of which detects signals of a specific wavelength or of a narrow range of 
15 wavelengths. In addition, signals can be resolved using various dichroic reflectors, mirrors 
and/or band pass filters in the optical path to separate different emission wavelengths, each of 
which is characteristic of a particular label (and thus a particular unit specific marker). The 
configuration of detectors will govern the requirement and placement of such mirrors and 
filters. Mirrors can be used to deflect signals below a particular wavelengths towards low 
20 wavelength detectors. Filters can be used to remove excitation wavelengths that are merely 
scattered by the polymer. Bandpass filters allow wavelengths of a particular range to pass 
through, and block all other wavelengths. Longpass filters allow wavelengths above a 
particular set minimum to pass. It is within the skill of the ordinary artisan to determine the 
placement of optical mirrors and filters along the length of the fluorescent beam radiating 

25 from the labeled unit specific markers. 

The detectable signals so generated are captured by and preferably recorded by a 
detection device, optionally at or within a detection station. As stated earlier, the detectable 
signal produced by each labeled unit specific marker is indicative of that particular marker, its 
sequence and corresponding the complementary unit in the target polymer to which the 

30 marker is bound. Signals are detected sequentially when signals from different markers are 
detected spaced apart in time (and thus distance along the length of the target polymer). Not 
all units need to be detected or need to generate a signal to detect signals "sequentially". The 
temporal separation of the peak outputs from the detection channels, together with knowledge 
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of the velocity at which the polymer is moving past the station (or the velocity at which the 
station is moving past the polymer) is used to calculate the distance between the two marker 
positions. 

The invention is not limited in scope to the type of detection technology used. Rather 
the method described herein may be adapted to any system capable of detecting sequence- ' 
specific tags on a linear polymer such as DNA. There are a number of detection schemes that 
would lend themselves to this type of analysis, including optical and non-optical approaches 
These detection systems include, but are not limited to, electron spin resonance detection 
atomic force microscope (AFM) detection, scanning tunneling microscope (STM) detection 
optical detection, nuclear magnetic resonance (NMR) detection, near-field detection 
fluorescence resonance energy transfer (FRET) detection, an electrical detection system a 
Photographic film detection system, a chemiluminescent detection system, an enzyme ' • 
detection system, and an electromagnetic detection system. As an example of a suitable 
detection scheme, a scanning tunneling system could be used to analyse sequence information 
from linear polymers provided that unit specific markers are labeled with compounds that are 
distinguishable using the scanning tunneling system. Similarly, the detection technologies 
described m PCTpublished patent applications WO98/35012 and WO00/09757, published on 
August 13, 1998 and February 24, 2000, respectively and in issued U.S. Patent 6 355 420 Bl 
•ssued March 12, 2002, can be used in conjunction with their respective labeling technologies 
for high-resolution linear analysis in accordance with the methods of the invention The 
entire contents of these patent applications are incorporated by reference herein in their 
entirety. 

The signals detected following the interaction of the energy source and the labeled 
specific marker may be stored in a database for analysis. One method of analyzing the stored 
signals is to align them in order to derive sequential linear sequence information about the 
polymer. By running two or more analyses, all of which contain as a control the same labeled 
unit specific marker, it is possible to combine the sequence information from the analyses 
thereby yielding even more information than would possibly be achieved in a single analysis 
Another method for analyzing the stored signals is to compare the stored signals to a pattern 
of signals from another polymer to determine the relatedness of the two polymers Yet 
another method for analyzing of the detected signals is to compare the detected signals to a 
known pattern of signals characteristic of a known polymer to determine the relatedness of the 
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polymer being analyzed to the known polymer. Comparison of signals is discussed in more 
detail below. 

In one aspect, the methods of the invention can be used to identify one, some, or all of 
the units of the polymer. This is achieved by identifying the type of individual unit and its 
5 position on the backbone of the polymer by determining whether a signal detected at that 
particular position on the backbone is characteristic of the presence of a particular labeled 
unit. 

The methods of the invention also are useful for identifying other structural properties 
of polymers. The structural information obtained by analyzing a polymer according to the 

10 methods of the invention may include the identification of characteristic properties of the 
polymer which (in turn) allows, for example, for the identification of the presence of a 
polymer in a sample or a determination of the relatedness of polymers, identification of the 
size of the polymer, identification of the proximity or distance between two or more 
individual units of a polymer, identification of the order of two or more individual units 

15 within a polymer, and/or identification of the general composition of the units of the polymer. 
Such characteristics are useful for a variety of purposes such as determining the presence or 
absence of a particular polymer in a sample. For instance when the polymer is a nucleic acid 
the methods of the invention may be used to determine whether a particular genetic sequence 
is expressed in a cell or tissue. 

20 The presence or absence of a particular sequence can be established by determining 

whether any polymers within the sample express a characteristic pattern of individual units 
which is only found in the polymer of interest i.e., by comparing the detected signals to a 
known pattern of signals characteristic of a known polymer to determine the relatedness of the 
polymer being analyzed to the known polymer. The entire sequence of the polymer of 

25 interest does not need to be determined in order to establish the presence or absence of the 
polymer in the sample. Similarly the methods may be useful for comparing the signals 
detected from one polymer to a pattern of signals from another polymer to determine the 
relatedness of the two polymers. 

Once all of the detectable signals are generated, detected and stored in a database the 

30 signals can be analyzed to determine structural information about the polymer. The computer 
may be the same computer used to collect data about the polymers, or may be a separate 
computer dedicated to data analysis. A suitable computer systeiti to implement the present 
invention typically includes an output device which displays information to a user, a main unit 
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connected to the output device and an input device which receives input from a user. The 
main unit generally includes a processor connected to a memory system via an 
interconnection mechanism. The input device and output device also are connected to the 
processor and memory system via the interconnection mechanism. 

It should be understood that one or more output devices may be connected to the 
computer system. Example output devices include a cathode ray tube (CRT) display, liquid 
crystal displays (LCD), printers, communication devices such as a modem, and audio output. 
It should also be understood that one or more input devices may be connected to the computer 
system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet, 
communication device, and data input devices such as sensors. It should be understood the 
invention is not limited to the particular input or output devices used in combination with the 
computer system or to those described herein. 

The computer system may be a general purpose computer system which is 
programmable using a high level computer programming language, such as C or C++. The 
computer system may also be specially programmed, special purpose hardware. In a general 
purpose computer system, the processor is typically a commercially available processor, of 
which the series x86 processors, available from Intel, and similar devices from AMD and 
Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC 
microprocessor from IBM and the Alpha-series processors from Digital Equipment 
Corporation, are examples. Many other processors are available. Such a microprocessor 
executes a program called an operating system, of which WindowsNT, UNIX, DOS, VMS 
and OS8 are examples, which controls the execution of other computer programs and 
provides scheduling, debugging, input/output control, accounting, compilation, storage 
assignment, data management and memory management, and communication control and 
related services. The processor and operating system define a computer platform for which 
application programs in high-level programming languages are written. 

A memory system typically includes a computer readable and writeable nonvolatile 
recording medium, of which a magnetic disk, a flash memory and tape are examples. The 
disk may be removable, known as a floppy disk, or permanent, known as a hard drive. A disk 
has a number of tracks in which signals are stored, typically in binary form, i.e., a form 
interpreted as a sequence of one and zeros. Such signals may define an application program 
to be executed by the microprocessor, or information stored on the disk to be processed by the 
application program. Typically, in operation, the processor causes data to be read from the 
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nonvolatile recording medium into an integrated circuit memory element, which is typically a 
volatile, random access memory such as a dynamic random access memory (DRAM) or static 
memory (SRAM). The integrated circuit memory element allows for faster access to the 
information by the processor than does the disk. The processor generally manipulates the data 
5 within the integrated circuit memory and then copies the data to the disk when processing is 
completed. A variety of mechanisms are known for managing data movement between the 
disk and the integrated circuit memory element, and the invention is not limited thereto. It 
should also be understood that the invention is not limited to a particular memory system. 

It should be understood the invention is not limited to a particular computer platform, 
10 particular processor, or particular high-level programming language. Additionally, the 
computer system may be a multiprocessor computer system or may include multiple 
computers connected over a computer network. 

The data stored about the polymers may be stored in a database, or in a data file, in the 
memory system of the computer. The data for each polymer may be stored in the memory 
75 system so that it is accessible by the processor independently of the data for other polymers, 
for example by assigning a unique identifier to each polymer. 

The following examples are provided to illustrate specific instances of the practice of 
the present invention and are not intended to limit the scope of the invention. As will be 
apparent to one of ordinary skill in the art, the present invention will find application in a 
20 variety of compositions and methods. 

Examples 

Example 1: Different fluorescent sequence-specific tags. 

There are many different types of fluorescent sequence-specific tags. These include 

25 fluorescent tags that can be differentiated by their emission spectra. Examples of fluorescent 
tags include standard dyes such as fluorescein, Cy3 and Cy5, all of which have different 
fluorescence emission spectra that can be distinguished using standard spectral filtering 
techniques. These techniques include dichroic mirrors, bandpass filters, notch filters, and 
combinations thereof. Fluorescent tags having the same spectra can also be distinguished by 

30 means of fluorescence lifetime determination. Fluorescence lifetime is the time between 

excitation and emission of a photon of a given fluorophore. For standard fluorophores such as 
those described, the fluorescence lifetime is on the order of 1 nanosecond to 5 nanoseconds. 
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The use of two fluorophores with the same emission spectra but different lifetimes is another 
approach to multiplex the number of sequence-specific tags in the system. 

Example 2: Different scanning tunneling sequence-specific tags. 
5 The use of non-fluorescent based approaches for differential tagging and high- 

resolution linear analysis may include the use of scanning tunneling tips for the analysis and 
scanning of linear molecules such as DNA at high speeds. Using this technique, sequence 
specific tags may include gold particles, silica particles, as well as nanocrystals. Since 
scanning tunneling tips are capable of size discrimination, differential tagging approaches can 
10 exploit different sized particles as probe labels. 

Example 3: Different typ es of fluorophores suitable for use with monochromatic excitation 
and multicolor detection systems. 

There are a number of different types of fluorophores that can be used in systems 

15 comprising monochromatic excitation (e.g., single laser systems) and multicolor detectors. 
For example, for fluorescence-based approaches, the tags that can be spectrally distinguished 
based on their differential emission spectra include dyes such as Cascade Blue, Alexa dyes, 
Cy dyes, tetramethylrhodamine (TAMRA), rhodamine-6G, infrared dyes, Texas Red™, 
Oregon Green, fluorescein, all of which are commercially available from a number of sources 

20 including but not limited to Molecular Probes, OR. 

As stated in Example 2, size can also be used to differentiate tags. Scanning tunneling 
based approaches can use semiconductor nanocrystals which yield size-dependent spectra. 
For instance, a 4 nm CdSe nanocrystal yields a different spectral emission than a 3 nm CdSe 
nanocrystal. Silica, gold, latex and ferritin particles can also be used in size discrimination 

25 systems. Different tags may have different properties included electrical, magnetic, chemical, 
and biological properties. 


Example 4: Experimental apparati for fluorescence detection. 

Various experimental apparati can be used in conjunction with the methods of the 
30 invention in order to obtain sequence information from linear polymers such as DNA 

molecules. Several experimental apparati are described in PCT published patent applications 
WO98/35012 and WO00/09757, and in U.S. Patent No. 6,355,420 Bl. These approaches are 
used to elongate DNA, deliver it to multiple excitation regions, and detect fluorescence from 


BNSOOCID: <WO 0302554CA2 I > 


WO 03/025540 PCT7US02/29687 

-44- 

various excitation regions along the length of the DNA. Other approaches that are suitable 
employ a molecular motor to read a strand of DNA, yet still detect fluorescence along the 
length of the DNA using a detection system. These latter methods may employ a polymerase 
or other enzyme or protein capable of scanning DNA as the molecular molecule. 

5 Physical detection systems may include CCD-based methods of imaging detection, 

confocal detection, electrical detection, multi-color methods of detection, near-field analysis, 
and FRET analysis. Other detection systems include single-color illumination methods such 
as confocal systems using a single laser excitation wavelength. The single color wavelength 
excites two different fluorescent entities which each have different emission wavelengths. A 

10 single-color illumination system may be preferable in some instances, particularly since it 
avoids the chromatic aberrations and parfocality problems that sometimes exist in dual 
excitation systems. These problems can also be overcome directly in dual excitation systems 
by using pre-aligned multiple wavelengths, such as is possible with multi-line argon-krypton 
laser systems. Fiber optic coupling of multiple laser lines can also achieve the same purpose. 

15 Confocal detection systems using these latter laser arrangements are suitable for use in the 
methods of the invention. 

Example 5: Sequence interrogation using multi-color enhanced resolution. 

Multi-color enhanced resolution methods can be applied to methods and probe sets 

20 used to obtain sequence information from a polymer such as DNA. Depending on how 
particular sequences of probes are labeled, different information can be obtained from the 
strand of DNA molecule. In the following Examples directed at DNA sequence analysis, a 
nucleic acid probe is used as the sequence-specific agent. Binding of the probe to the target 
DNA (i.e., the DNA intended to be analyzed) positions a fluorophore or a fluorochrome along 

25 the length of the DNA. The methods of the invention employ any number of means or 

methods for introducing, attaching or binding a fluorochrome or fluorophore (difference) to a 
probe. Methods of fluorophore incorporation into or conjugation to a nucleic acid probe are 
known to those of ordinary skill in the art. Accordingly, the method is not dependent upon 
the method of probe labeling provided that such labeling does not differentially compromise 

30 the binding capability of the probe to the target molecule. 

In the simplest scenario, interrogation of a target sequence using two probes can be 
performed using differentially labeled probes directed at different target sequences. An 
example of this is shown in Figure 2. In Figure 2, a strand of DNA is bound at two adjacent 
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sites with two probes having different labels and specificity for different target sequences. 
The two adjacent sites are located relative to each other in the sub-resolution of the detection 
zone of interest. In a confocal system, the sub-resolution of the detection zone of interest is 
below the A/2 diffraction limit of the confocal spot. If the light wavelength is 532 nm, then 
5 this limit is 266 nm or 782 base-pairs of information. 

In some embodiments of the invention, and depending on the detection system used 
and the number of markers used, it is possible to detect markers (and thus, units) separated 
from each other by less than 750 bp, less than 700 bp, less than 650 bp, less than 600 bp, less 
than 550 bp, less than 500 bp, less than 450 bp, less than 400 bp, less than 350 bp, less than 
10 300 bp, less than 250 bp, less than 200 bp, less than 150 bp, less than 100 bp, or less than 50 
bp. 

A second scenario involves using probes that have the same sequence (and thus the 
same target sequence specificity). Figure 3 illustrates this situation. The distance between the 
two target sites would normally be below the resolution limit of the optical system as used in 

15 the prior art. If the target molecule is exposed to a solution of probes that share an identical 
fluorescent tag, then these probes will bind to adjacent target sequences on the target molecule 
and because of their identical labels will be incapable of resolution. 

If however, the mixture of probes contains equal numbers of probes that are labeled 
with two different labels, then this spatial limitation is overcome. For instance, if 50% of the 

20 probe (or fluorescently tagged site) is labeled with red fluorophore (Cy5) and 50% of the 

probe is labeled with a green fluorophore (fluorescein), then because of the equimolar mixture 
of the differently labeled probes, there is a 50% probability that there will be one probe of 
either red or green attached to any one particular site. This results in 2 2 (i.e., 4) possible 
combinations of how such probes would bind to two adjacent sites. These possible 

25 combinations of two distinctly labeled probes binding to two adjacent sites are illustrated in 
Figure 4. 

As illustrated in Figure 4, half of the possible combinations, i.e., those in which both 
sites are bound by green labeled probes or both sites are bound by red labeled probes, will not 
be resolved. The bottom two combinations illustrated in Figure 4 will be resolved however. 
30 In these combinations, the probes binding to adjacent sites are differently labeled, and provide 
either a green-red or a red-green pattern. Because of the ability to resolve the position of 
probes in this latter situation, useful sequence information from the DNA molecules can be 
achieved an high throughput linear analysis is possible. 
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Figure 5 illustrates the signals that can be achieved from the DNA molecules of Figure 
4 with the additional feature of a backbone stained with a blue intercalating compound. 
Signals 1 and 2 correspond to the top two probe placements in Figure 4. Signal 3 and 4 
correspond to the bottom two probe placements in Figure 4. Signals 3 and 4 allow for dual- 

5 color increased resolution of adjacent tags, and thus only these probe arrangements are useful 
per se in deriving sequence information. If the identical sequence probes are labeled with the 
same label, then the probes cannot be distinguished from each other if the distance between 
them is less than the spatial resolution limit. If however identical sequence probes labeled 
with different labels are used the adjacent sites can be discerned from each other even if the 

10 distance between the sites (and thus the bound probes) is less than the spatial resolution limit 

The resolution offered by differential tagging is much greater than that offered by 
conventional tagging approaches using only one type of tag. Lacoste et al. (Lacoste, T.D. et 
al., Proc. Nat'l Acad. Set 97(17):9461-9466.) give insight into the order of magnitude of the 
resolution that can be attained using a fluorescence-based detection system. In this study, 

15 Lacoste et al., report high-resolution interdistance determination between fluorescent species 
excited with a single excitation wavelength and with emissions at two different wavelengths. 
For instance, fluorescent transfluorospheres (TFS) and semiconductor nanocrystals (NC) were 
illuminated with* a single laser line and particles emitting at two different spectral ranges 
could be distinguished when situated at least 25 nm apart. The general range of resolution 

20 was from 25 nm to 75 nm for the two-color imaging approach of LaCoste et al. The current 
invention takes this work several stages further in proposing a general high-resolution 
analysis method of linear DNA, either fixed or moving in a flow-based system. The second 
major part of the current invention is that various combinations of tagging strategies can allow 
the use of a differential tagging approach to interrogate information on DNA in a high- 

25 resolution rapid manner. 

This minimal spatially resolvable distance corresponds approximately to 70 base-pairs 
of information. Thus, the method of the invention can be used to determine sequence 
information at 70 base pair intervals. This is a vast improvement over the current limit of 782 
base pair intervals. Resolution of probes within 70 base pairs of each other approximates the 

30 distance between 3-mer probes (i.e., 64 base pairs between randomly placed probes of 3 

nucleotides in length). The resolution that can be achieved using the methods of the present 
invention also enables the use of 4-mer probes which are randomly located within 256 base 
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pairs of each other. This level of resolution could not be attained using the single color 
analysis methods of the prior art. 

In the example of Figure 4, each of the probe combinations has a 25% probability of 
occurring. Only one half of these combinations however will yield a valuable result. This 
5 probability is approximate however, since there will be instances in which only one of the 
adjacent sites is bound to a probe, and the other is free (or unlabeled), or in which only one 
site is detected by the system. Binding of probes can be maximized by increasing the ratio of 
probe to target molecule so that the amount of probe is not limiting. Moreover, binding 
efficiency can be increased by maximizing hybridization conditions corresponding to a given 
10 probe sequence. 

The invention contemplates other approaches to increasing the efficiency of probe 
detection. For instance, the complexity of the probe mixture and the number of colors present 
in the probe mixture can be increased. As an example, a mixture of three differently labeled 
yet identical sequence probes can be used rather than the two label combination described 

15 above. If the mixture contains equal numbers of; for example, green labeled probes, red 

labeled probes and blue labeled probes, then each probe has a 33% chance of binding to either 
of the adjacent sites, assuming that binding at each site is independent of binding at the other. 
It follows then that binding of two adjacent sites by three different probes would result in 3 2 
possible combinations of colors occupying the two sites. Three of these combinations will be 

20 unresolved combinations of colors because the same colored probe will bind to both sites. 
However, approximately 67% of the possible combinations will yield usable information. 
This is an increase over the 50% of usable combinations that could be achieved using only a 
two color probe mixture. A mixture of four differently labeled yet identical sequence probes 
similarly will yield 75% usable possible combinations. If the probe mixture contains 100 

25 differently labeled probes, then 99% of the possible combinations yield useful information. 
Accordingly, the invention intends to embrace probe mixtures having 2, 3, 4, 5, 6, 7, 8, 9, 10, 
20, 25, 30, 50, 75, 100 or more differently labeled probes. 

As discussed above, the invention contemplates the use of probes of varying lengths 
including 3-mers, 4-mers, 5-mers, 6-mers, etc. If a 6-mer sequence recognition tags is used, 

30 and if nucleotides are randomly distributed throughout the genome, then any given 6-mer 
sequence would be predicted to occur every 4 6 or 4096 base-pairs. Since the genome is not 
random, it is expected that there will be a range of distances between target sequences on the 
target DNA (and accordingly, a range of distances between bound probes on such a target 
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DNA). This range might span from a few base-pairs to more than ten-thousand base-pairs. 
The greater resolution provided by the differential tagging system described herein allows 
resolution of 6-mer sequences that might occur within 4096 base pairs of each other. The 
high resolution method of the invention would not sacrifice the speed of passage of DNA 
5 molecules through the channel systems, particularly when probes are located within a short 
distance of each other. In fact, as stated above, probes located as close as 70 base pairs from 
each other should be resolvable using the methods described herein. 

For instance, if the lambda genome is analyzed using the sequence specific tag 
GAATTC (6 base-pairs), then target sites will be separated from each other in the lambda 
W genome by 3530, 4878, 5643, 5804, 7421 and 21226 base-pairs (including the end-tags of the 
lambda DNA). In this one example, there is a wide range of distances that may separate 6- 
mer probes or tags. Using a 6-mer probe having the sequence of AAGCTT, target sites will 
be separated from each other in the lambda genome by 2027, 2322, 4361, 6557, 9416 and 
23 130 base-pairs (including the end-tags of the lambda DNA). An optical system that was at 
15 a spatial resolution of 3000 base-pairs resolution prior to the discovery of the invention would 
be aided by the use of multi-color high resolution analysis and differential labeling of the 
same sites with different color tags. One of the advantages of this method of labeling is its 
ability to reduce the amount of DNA lost during throughput through the system. 

The ability to multiplex different sequences using the differential tagging method of 
20 the invention different colors is diminished relative to monochromatic methods of labeling. 
In addition, a greater number of fragments of a given sequence need to be sampled using the 
differential tagging method in order to attain identical results as the monochromatic method. 
In an important embodiment, more than one color is assigned to a particular sequence. For 
example, if the high resolution method described herein employs four different colors, two of 
25 more of these colors will be assigned to a particular sequence. In so doing, the number of 
different sequences that can be analyzed at a given time is reduced. In the monochromatic 
method, each sequence can be assigned to a different color and accordingly, a greater number 
of sequences can be analyzed at the same time. The drawback of the monochromatic method 
is that contiguous target sequences that are situated within the resolution detection limit of the 
30 system are not detected. Using the high resolution method of the invention, fewer sequences 
can be analyzed at a given time, and more sample runs will be required, however, greater 
spatial resolution can be achieved. Accordingly, there is a trade-off between being able to 
analyze at higher spatial resolution and diminished multiplex capability, and a higher 
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sampling requirement Throughput estimates of the efficiency of differential tagging have 
been derived, and are presented below. 

Assuming a monochromatic system that has a raw DNA delivery throughput of 10 
million base-pairs per second (MB/s), a calculation for the amount of time to analyze one 
human genome at 10-fold coverage (i.e., analysis of 10 copies of the human genome) takes 
into consideration the following parameters: 


Parameter 

Value 

number of base-pairs in human genome 

2*(3xlO*) = 6xlO y bp 

% time actually collecting data from DNA 

20% 

number of copies of human genome 
analyzed 

lOx 


The total time to analyze a genome using 10 MB/s data rate is calculated as follows: 
10 (number of base-pairs in human genome) 

x (number of copies of human genome analyzed) 
-s- (throughput rate) 

■+■ (% of time actually spent collecting data from DNA) 
- 60 - 60 

15 Taking into consideration the values provided for this system, it would take 8.33 hours in a 
single detector collection system (i.e., a monochromatic system) to collect the information 
from a human genome sample, with 10-fold coverage. In a situation in which one six-mer is 
used, the amount of nucleic acid that can be sequenced in a single run is 6/4096 x (the length 
of the molecule or genome). 

20 Suppose instead that the system uses at least two differentially labeled probes. There 

is a finite probability that there are more than two target 6-mer sites within the optical 
resolution limit of the system. In order to determine the number of probes in the optical 
detection volume, it is necessary to determine the probability at which these probes are 
present in their alternating color schemes. An example of a situation in which there are three 

25 adjacent target sites and a mixture of three differently labeled (but identical sequence probes) 
is shown in Figure 6. In order to determine that there are three sites within the optical probe 
volume by probability, the three sites must be occupied by alternating colored probes. In the 
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case in which there are only two differently labeled probes, this happens 2 out of 2 (i.e., 2 out 
of 8) times, or in 25% of the target DNA molecules. 

The binding and detection efficiency will also affect the time required to perform the 
analysis. Assuming that this combined binding and detection efficiency is 90%, then there 
5 will be a 53% probability that all three sites will be occupied. This means that 12.5% of 

target molecules will be bound by the correct pattern of differently labeled probes required to 
determine if there are three target sites (and thus three bound probes) in the volume. It 
follows that a 10-fold coverage (i.e., analysis of 10 target molecules for each genome sample) 
is minimally sufficient to capture these less frequent events. In order to achieve statistically 
10 significant data, it might be necessary to analyze more than 10 copies per genome sample, and 
this in turn would lead to longer analysis times per genome. 

The enhanced resolution that can be achieved through differential tagging can be 
compared to that achievable using other methods of enhanced resolution, such as near-field 
analysis of the DNA. An examination of the real properties of a fluorophore in transit through 
15 a confocal illumination region demonstrates that a single fluorophore (in this one example) 
emits between 15-20 counts per bin. The DNA is travelling at 10,000 jxm/sec with an 
approximate confocal spot size of 1 pm. The sampling rate is at 10 kHz. Therefore, each 
fluorophore spends 1 binwidth or 10 \xs in the confocal laser spot By decreasing the spot 
size, we decrease the captured signal proportionally. For instance, suppose that we decrease 
20 the illumination region to 200 nm using near-field analysis. This is 20% the illumination 

region. The pass-through rate of DNA would have to be reduced five-fold to ensure that the 
fluorescent probes are captured by the system. Alternatively, the sampling rate can be 
increased to 50 kHz to ensure that the passage of the individual fluorophores through the 
system is captured. A five-fold increase in the sampling rate leads to a decreased signal 
25 capture rate of 3 - 4 counts per bin. Trade-offs in using a smaller excitation volume include a 
decreased signal-to-noise ratio and/or a decreased throughput rate. 

In contrast, the differential tagging method of the invention involves a different set of 
trade-offs. The signal-to-noise ratio and also potentially the throughput passage rate of the 
DNA molecules are not reduced. Instead, it is estimated that a larger number of molecules 
30 should be analyzed to give the same statistically significant positional information from the 
probes. The analysis is carried out in the previous example where to obtain identical statistics 
on the population set, the sample would need to be analyzed at a 1 Ox greater redundancy. In 
small sample sizes, such as the analysis of small genomes up to several tens of MB (million 
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base-pairs), this approach would be feasible for higher resolution. This approach may also be 
feasible for much larger genomes, if the trade-off in lesser statistics is immaterial. Figure 7 
illustrates the signals detected by running a lambda molecule through the system. The lambda 
molecule is labeled at two sites, and there are two detection regions (one for the backbone 
5 label and one for the probe). 

Example 6: Scanning-based methods for high-resolution determination of sequences. 

The above described methods of tagging DNA using differential labels also applies to 
fixed methods of DNA analysis. In these latter methods, the DNA is tagged differentially, 

10 and then either scanned or imaged. In a fluorescence-based system, for instance, the above 
example of resolving three sites within the optical resolution of the system would provide a 
representative image as such is shown in Figure 8. 

In this one example, three simultaneous images are captures from the spectrally 
separated signals arising from the sample. The images are then overlayed and the spatial 

15 positions determined with sub-optical resolution accuracy. The center of each of the 
emissions from the sequence-specific tags and the center-to-center distance spacing is 
determined from the captured image. It is expected that there will be a population of 
molecules in the image that represent molecules having different combinations of fluorescent 
tags attached to the target sites. In the imaging pictures above, it is expected that there would 

20 be combinations of RRR, GGG, RGR, GRG, RRG, and so on where R = red and G = green. 
A large enough number of molecules should be sampled in order to obtain a sufficient number 
of molecules with alternating patterns of differentially tagged tags. The alternating pattern 
allows decon volution of the number of tags within the optical resolution limit. 

25 Equivalents 

The foregoing written specification is to be considered to be sufficient to enable one 
skilled in the art to practice the invention. The examples disclosed herein are not to be 
construed as limiting of the invention as they are intended merely as illustrative of particular 
embodiments of the invention as enabled herein. Therefore, systems that are functionally 
30 equivalent to those described herein are within the spirit and scope of the claims appended 
hereto. Indeed, various modifications of the invention in addition to those shown and 
described herein will become apparent to those skilled in the art from the foregoing 
description and fall within the scope of the appended claims. 
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15 


Claims 

1 . A method for analyzing a polymer comprising: 

a) providing a detection station having a known detection resolution; 

b) labeling the polymer with first and second unit specific markers, the first 
unit specific marker including a first label and the second unit specific marker including a 
second label distinct from the first label, wherein the first and second unit specific markers are 
spaced apart on the polymer such that, if the labels were not distinct from each other, they 
would be separated by a distance less than the detection resolution; 

c) exposing the polymer labeled as in (b) to the detection station to produce 
distinct first and second signals arising from the first and second labels; and 

d) identifying the distinct first and second signals. 

2. The method of claim 1 , wherein the first unit specific marker is different from 
the second unit specific marker. 

3 . The method of claim 1 , wherein the first unit specific marker is identical to the 
second unit specific marker. 

4. The method of claim 1, wherein the first unit specific marker and the second 
20 unit specific marker are positioned immediately adjacent to one another. 

5 . The method of claim 1 , wherein the first unit specific marker and the second 
unit specific marker are spatially separated from one another by at least two units. 

25 6 - The method of claim 1 , wherein the polymer is labeled with a third unit 

specific marker comprising a third label. 

7. The method of claim 6, wherein the third unit specific marker is spaced apart 
from the first and second unit specific markers by a distance greater than the known detection 
30 resolution. 


8 . The method of claim 1 , wherein the first and second unit specific markers are 
nucleic acid molecules. 
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9. The method of claim 1, wherein the first and second unit specific markers are 
peptide nucleic acid molecules or locked nucleic acid molecules. 

5 10. The method of claim 8, wherein the first and second unit specific markers have 

an identical nucleotide sequence. 

1 1 . The method of claim 8, wherein the first and second unit specific markers are 
less than 12 bases in length. 

10 

12. The method of claim 8, wherein the first and second unit specific markers are 
at least 4 bases in length. 

13. The method of claim 1, wherein the first label and second label are . 

15 independently selected from the group consisting of an electron spin resonance molecule, a 
fluorescent molecule, a chemiluminescent molecule, a radioisotope, an enzyme substrate, an 
enzyme, a biotin molecule, an avidin molecule, an electrical charge transferring molecule, a 
semiconductor nanocrystal, a semiconductor nanoparticle, a colloid gold nanocrystal, a 
ligand, a microbead, a magnetic bead, a paramagnetic molecule, a quantum dot, a 

20 chromogenic substrate, an affinity molecule, a protein, a peptide, a nucleic acid, a 
carbohydrate, a hapten, an antigen, an antibody, an antibody fragment, and a lipid. 

14. The method of claim 1, wherein the signals are detected using a detection 
system selected from the group consisting of an electron spin resonance (ESR) detection 

25 system, a charge coupled device (CCD) detection system, a fluorescent detection system, an 
electrical detection system, an electromagnetic detection system, a photographic film 
detection system, a chemiluminescent detection system, an enzyme detection system, an 
atomic force microscopy (AFM) detection system, a scanning tunneling microscopy (STM) 
detection system, an optical detection system, a nuclear magnetic resonance (NMR) detection 

SO system, a near field detection system, and a total internal reflection (TIR) detection system. 

15. The method of claim 1, wherein the polymer is a nucleic acid molecule. 
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1 6. The method of claim 1, wherein the polymer is genomic DNA. 

1 7. The method of claim 1 , wherein the polymer comprises a backbone that 
includes a label. 


18. A system for optically analyzing a polymer of linked units comprising: 

a) an optical source for emitting optical radiation of a known wavelength; 

b) an interaction station for receiving the optical radiation in an optical path and for 
sequentially receiving units of the polymer that are exposed to the optical radiation to produce 

10 detectable signals; 

c) dichroic reflectors in the optical path for creating at least two separate wavelength 
bands of the detectable signals; 

d) optical detectors constructed to detect radiation including the signals resulting from 
interaction of the units with the optical radiation; and 

15 e) a processor constructed and arranged to analyze the polymer based on the detected 

radiation including the signals. 

19. The system of claim 18, wherein the units of the polymer are labeled with at 
least two radiation sensitive labels. 


20. The system of claim 18, wherein the interaction station includes a slit having i 
slit width in the range of 1 nm to 500 nm, the slit producing a localized radiation spot. 

21. The system of claim 20, wherein the slit width is in the range of 1 0 nm to 1 00 

25 nm. 


22. The system of claim 1 8, wherein the interaction station includes a 
microchannel and a slit having a submicron width arranged to produce the localized radiation 
spot, the microchannel being constructed to receive and advance the polymer units through 
30 the localized radiation spot. 
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23. The system of claim 21, further including a polarizer and wherein the optical 
source includes a laser constructed to emit a beam of radiation, the polarizer being arranged to 
polarize the beam prior to reaching the slit. 

5 24. The system of claim 21, wherein the polarizer is arranged to polarize the beam 

in parallel to the width of the slit. 

25. A method for analyzing a polymer of linked units comprising: 
a) providing a microchannel; 

10 b) generating optical radiation of a known wavelength to produce a localized 

radiation spot at the microchannel to define a detection station having a known detection 
resolution; 

c) labeling the polymer with first and second unit specific markers, the first 
unit specific marker including a first label and the second unit specific marker including a 

15 second label distinct from the first label, wherein the markers are spaced apart on the polymer 
such that, if the labels were not distinct from each other, they would be separated by a 
distance less than the detection resolution; 

d) sequentially exposing the first and second labels to the localized radiation 

spot; 

20 e) sequentially detecting radiation of at least two distinct wavelength bands 

resulting from interaction of the first and second labels with the localized radiation spot; and 

f) analyzing the polymer using the detected wavelength bands. 

26. The method of claim 25, further comprising applying an electric field to move 
25 the polymer through the microchannel. 

27. The method of claim 25, further comprising applying pressure to move the 
polymer through the microchannel. 

30 28. The method of claim 25, further comprising applying suction to move the 

polymer through the microchannel. 
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29. The method of claim 25, wherein the first and second labels are independently 
selected from the group consisting of an electron spin resonance molecule, a fluorescent 
molecule, a chemiluminescent molecule, a radioisotope, an enzyme substrate, an enzyme, a 
biotin molecule, an avidin molecule, an electrical charge transferring molecule, a 
semiconductor nanocrystal, a semiconductor nanoparticle, a colloid gold nanocrystal, a 
ligand, a microbead, a magnetic bead, a paramagnetic molecule, a quantum dot, a 
chromogenic substrate, an affinity molecule, a protein, a peptide, a nucleic acid, a 
carbohydrate, a hapten, an antigen, an antibody, an antibody fragment, and a lipid. 


10 30. 


The method of claim 25, wherein the first and second labels are fluorophores. 


31. The method of claim 25, wherein the detecting includes collecting the first and 
second signals arising from the first and second labels while the first and second unit specific 
markers are moving through the microchannel. 

32. The method of claim 25, wherein the first unit specific marker is different from 
the second unit specific marker. 


33. The method of claim 25, wherein the first unit specific marker is identical to 
20 the second unit specific marker. 

34. The method of claim 25, wherein the first unit specific marker and the second 
unit specific marker are positioned immediately adjacent to one another. 

25 35 " The method of claim 25, wherein the first unit specific marker and the second 

unit specific marker are spatially separated from one another by at least two units. 

36. The method of claim 25, wherein the polymer is labeled with a third unit 
specific marker, including a third label. 


30 


37. The method of claim 36, wherein the third unit specific marker is spaced apart 
from the first and second unit specific markers by a distance greater than the minimum 
detection resolution. 
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38. The method of claim 25, wherein the first and second unit specific markers are 
nucleic acid molecules. 

5 39. The method of claim 25, wherein the first and second unit specific markers are 

peptide nucleic acid molecules or locked nucleic acid molecules. 

40. The method of claim 38, wherein the first and second unit specific markers 
have an identical nucleotide sequence. 

10 

41 . The method of claim 38, wherein the first and second unit specific markers are 
less than 12 bases in length. 

42. The method of claim 38, wherein the first and second unit specific markers are 
15 at least 4 bases in length. 

43. The method of claim 25, wherein the first label and second label are 
independently selected from the group consisting of an electron spin resonance molecule, a 
fluorescent molecule, a chemiluminescent molecule, a radioisotope, an enzyme substrate, an 

20 enzyme, a biotin molecule, an avidin molecule, an electrical charge transferring molecule, a 
semiconductor nanocrystal, a semiconductor nanoparticle, a colloid gold nanocrystal, a 
ligand, a microbead, a magnetic bead, a paramagnetic molecule, a quantum dot, a 
chromogenic substrate, an affinity molecule, a protein, a peptide, nucleic acid, a carbohydrate, 
a hapten, an antigen, an antibody, an antibody fragment, and a lipid. 

25 

44. The method of claim 25, wherein the signals are detected using a detection 
system selected from the group consisting of an electron spin resonance (ESR) detection 
system, a charge coupled device (CCD) detection system, a fluorescent detection system, an 
electrical detection system, an electromagnetic detection system, a photographic film 

30 detection system, a chemiluminescent detection system, an enzyme detection system, an 

atomic force microscopy (AFM) detection system, a scanning tunneling microscopy (STM) 
detection system, an optical detection system, a nuclear magnetic resonance (NMR) detection 
system, a near field detection system, and a total internal reflection (TER) detection system. 
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45. The method of claim 25, wherein the polymer is a nucleic acid molecule. 

46. The method of claim 25, wherein the polymer is genomic DNA. 

47. The method of claim 25, wherein the polymer comprises a backbone that 
includes a label. 

48. A method for analyzing a polymer comprising 

labeling a polymer with a set of unit specific markers, wherein each unit specific 
marker of the set recognizes and binds to units of identical sequence within the polymer and 
wherein each unit specific marker is labeled with one of at least two distinct labels, and 

detecting signals arising from the labels to analyze the polymer. 


are 


15 49 - ^ method of claim 48, wherein about 50% of the unit specific markers 

labeled with a first label and about 50% of the unit specific markers are labeled with a second 
label. 

50. The method of claim 48, wherein each unit specific marker is labeled with one 
20 of at least three distinct labels. 

51. The method of claim 48, wherein each unit specific marker is labeled with one 
of at least four distinct labels. 

25 52 * The method of claim 48, wherein the unit specific markers are nucleic acid 

molecules. 

53. The method of claim 48, wherein the unit specific markers are peptide nucleic 
acid molecules or locked nucleic acid molecules. 


30 


54. The method of claim 52, wherein the unit specific markers have identical 
sequence. 
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55. The method of claim 52, wherein the unit specific markers are less than 12 
bases in length. 

56. The method of claim 48, wherein the labels are of a type selected from the 
5 group consisting of an electron spin resonance molecule, a fluorescent molecule, a 

chemiluminescent molecule, a radioisotope, an enzyme substrate, an enzyme, a biotin 
molecule, an avidin molecule, an electrical charge transferring molecule, a semiconductor 
nanocrystal, a semiconductor nanoparticle, a colloid gold nanocrystal, a ligand, a microbead, a 
magnetic bead, a paramagnetic molecule, a quantum dot, a chromogenic substrate, an affinity 
10 molecule, a protein, a peptide, a nucleic acid, a carbohydrate, a hapten, an antigen, an 
antibody, an antibody fragment, and a lipid. 

57. The method of claim 48, wherein the distinct labels are of different types. 


BNSDOCtD: <W O 0 3025S4QA2 \ > 


WO 03/025540 


PCT/US02/29687 


1/8 



DNA 
molecule 



Not resolved 



DNA 
molecule 


Resolved 


At 


Figure 1 


BNSDOCID: <WO 


_0302554GA2J_> 


WO 03/025540 


PCT7US02/29687 


2/8 


\ \ 


P3 


Sequence A 


Sequence B 


Figure 2 


BNSDOCID: <WO 


.0302554QA2J_> 


WO 03/025540 


PCT/US02/29687 


3/8 


Sequence A 


Sequence A 


Figure 3 


BNSDOC1D: <WO 


0302554O\2 I > 


WO 03/025540 


PCT/US02/29687 


4/8 


\ \ 


Sequence A 


Sequence A 


Sequence A 


Sequence A 


\ _ \ 


Sequence A 


Sequence A 


\ \ 


Sequence A 


Sequence A 


Figure 4 


WO 03/025540 PCT/US02/29687 


5/8 



BNSDOCID: <WO 0302554QA2 I > 


WO 03/025540 


PCT/US02/29687 


6/8 



optical probe volume 


Figure 6 


WO 03/025540 


PCT/US02/29687 


7/8 



Figure 7 


BNSDOCID: <WO 03025540A2 I > 


WO 03/025540 


PCT/US02/29687 


8/8 


Optical resolution limit 
without differential 
tagging approaches 
described 





overlay 

images 
► 



blue 


red 


green 


composite high resolution 
image 


Figure 8 


BNSOOCID: <WO 


.0302554CA2_I_> 


r 


(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 


(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
27 March 2003 (27.03.2003) 



II 



PCT 


(10) International Publication Number 

WO 03/025540 A3 


(51) International Patent Classification 7 : 


C12Q 1/68 


(21) International Application Number: PCT/US02/29687 

(22) International Filing Date: 

18 September 2002 (1 8.09.2002) 


(25) Filing Language: 

(26) Publication Language: 


English 
English 


(30) Priority Data: 

60/322,981 18 September 2001 (18.09.2001) US 

(71) Applicant: U.S. GENOMICS, INC. [US/US]; 6H Gill 
Street, Wobum, MA 01801 (US). 

(72) Inventors: CHAN, Eugene, Y.; 133 Park Street, #1001, 
Brookline, MA 02246 (US). FUCHS, Martin; 11 Sophia 
Drive, Uxbridge, MA 01569 (US). GILMANSHIN, 
Rudolf; 15 Keach Street, Waltham, MA 02453 (US). 


(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, FI, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, OM, PH, PL, PT, RO, RU, SD, SE, SG, 
SI, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, UZ, VC, 
VN, YU, ZA, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, 
ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, SK, 
TR), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, 
GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

16 October 2003 


(74) Agent: TREVISAN, Maria, A.; Wolf, Greenfield & For two-letter codes and other abbreviations, refer to the "Guid- 
Sacks, P.C., 600 Atlantic Avenue, Boston, MA 02210 ance Notes on Codes and Abbreviations'* appearing at the begin- 


(US). 


ning of each regular issue of the PCT Gazette. 


(54) Title: DIFFERENTIAL TAGGING OF POLYMERS FOR HIGH RESOLUTION LINEAR ANALYSIS 



DNA 

molecule 



Not resolved 


< 


IT) 
IT) 



At 


DNA 
molecule 


Resolved 


(57) Abstract: The invention provides methods and systems for improved spatialresolution of signal detection, particularly as ap- 
plied to the analysis of polymers such as biological polymers. The methods and systems comprise differentially tagging polymers 
^* in order to increase resolution. 


BNSDOC1D: <WO. 


03O2S54OA3_l_> 


INTERNATIONAL SEARCH REPORT 

A ^ CLASSIFICATION OF SUBJECT MATTER 
IPC(7) : CI2Q 1/68 

US CL 435/6 

. Accordin^ter^tion^nt Classification qPQ or to both nH ,inn Bl clas siQcatioB »n« IPr 


Mi °^ d ^5^M O ^^2S d 2 (ClMS ^ rati0n SyStem f ° UoWed by d^ifi^on symbols) 


Documentation searched other ton minimum documentation to the extent that such documents axe included in .he fields searched 


^ st°uspT ^s:^ 8 ^r ational <name of ^ ^ ^ pracucab,e - -* «— 


DOCUMENTS CONSIDERED TO BE RELEVANT 


Category * 


Citation of doc ument, with indication, where appropriate, of the rel evant passages 
US 6,246,046 Bl (LANDERS et al.) 12 June 2001 (12.06.2001), see entire document.' 

US 6,225,067 Bl (ROGERS) 01 May 2001 (01.05.2001), see entire document. 

US 5,422,271 A (CHEN et al.) 06 June 1995 (06.06. 1995). see entire document. 


Further documents are listed in the continuation of Box C. 


Special categories of cited documents: 


document denning the general state of the art which is not considered to 
be of particular relevance 


-E- earlier application or patent published on or after the international 


date 


filing 


-L- document which may throw doubts on priority claimfs) or which is cited 
to establish the publication date of another citation or other special reason 
(as specified) 


•O- document referring to an oral disclosure, use. exhibition or other 


means 


-P 


document published prior to the international filing date but later than the 

pnooiy dale daimcd _ nc 

Date of the actual completion of the international search 


20 February 2003 (20.02.2003] 
Name and mailing address of the ISA /US 

Commissi oner of Patents and Trademarks 
Box PCT 

Washington. D.C. 20231 

Facsimile No. (703)305-3230 


Form PCT/ISA/210 (second sheet) (July 1998) 


Relevant to claim No. 
1-17 


See patent family annex. 


later document published after the international filing date or 

priority date and not in conflict with the application but cited to 
understand the principle or theory underlying the invention 

document of particular relevance; the claimed invention cannot be 
considered novel or cannot be considered to involve an inventive 
step when the document is taken alone 

document of particular relevance; the claimed invention cannot be 
considered to involve an inventive step when the document it 
combined with one or more other such documents, such 
combination being obvious to a person skilled in the an 

document member of the same patent family 


Date nof mailing of the international search report 

13 MAfi ZOO J 

Autho? 


:d officer 
idl^y L. Sisson 
Telephone No. (703) 308-0196 


BNSDOC1D: <WO 


0302554QA3J > 


INTERNATIONAL SEARCH REPORT 


International application No. 


PCT/US02/29687 


Box I Observations where certain claims were found unsearchable (Continuation of Item 1 of first sheet) 

This internationa] report has not been established in respect of certain claims under Article I7(2)(a) for the following reasons: 

1. 1 I Claim Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 


2. | | Claim Nos.: 

because they relate to parts of the international application that do not comply with the prescribed requirements to 
such an extent that no meaningful international search can be carried out, specifically: 


3- □ 
6.4(a). 


Claim Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 


Box II Observations where unity of invention is lacking (Continuation of Item 2 of first sheet) 


This International Searching Authority found multiple inventions in this international application, as follows: 
Please See Continuation Sheet 


1 . 1 1 As all required additional search fees were timely paid by the applicant, this international search report covers all 
searchable claims. 

As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite 
payment of any additional fee. 

3. | | As only some of the required additional search fees were timely paid by the applicant, this international search 
report covers only those claims for which fees were paid, specifically claims Nos.: 


4. ^ No required additional search fees were timely paid by the applicant. Consequently, this international search report 
is restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 1-17 


Remark on Protest 


The additional search fees were accompanied by the applicant's protest. 
No protest accompanied the payment of additional search fees. 
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BOX n. OBSERVATIONS WHERE UNITY OF INVENTION IS LACKING 

This application contains the following inventions or groups of inventions which are not so linked as to form a single 
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Group I. claim(s) 1-17, drawn to a method of analyzing any polymer by use of a detection station. 
Group II. claim(s) 18-24, drawn to a system for optically analyzing a polymer of linked units. 
Group III. claim(s) 25-47. drawn to a method of analyzing a polymer through use of microchannels. 

Group IV. claim(s) 48-57. drawn to a method of analyzing any polymer with labeled markers that bind to specific sequences. 
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technical feature such that they have unity of invention Accordingly . tile inventions are not linked by a special 
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